Connectionists: software release: Bayesian DAG learning

Kevin Murphy murphyk2 at gmail.com
Sun Sep 16 18:51:28 EDT 2007


I am pleased to announce the release of "BDAGL" (pronounced
"be-daggle"), a Matlab/C package for learning Bayes net structures
from fully observed data (discrete or continuous, static or time
series). Its main novelty is that implements various algorithms for
exact Bayesian inference of posterior features/ modes using dynamic
programming. It also supports MCMC (on DAGs and orders) with various
proposal distributions. Details are given at the URL below. Feedback
is welcome.

Kevin Murphy

http://www.cs.ubc.ca/~murphyk/Software/BDAGL/index.html


Major features

- Computes the most probable graph G_map = arg max_G p(G|D) exactly
using the dynamic programming (DP) algorithm of Silander & Myllymaki
UAI'06,
where G is a DAG and D is data. This takes O(d 2^d) time and space,
so is limited to about 20 variables. This takes about 5 seconds for d=10
to about 5 minutes for d=20.

- Computes exact edge marginals using Bayes model averaging,
p(G_{ij}=1|D) = sum_G I(G(i,j)=1) p(G|D),
using the DP algorithms of Koivisto & Sood JMLR'04,
and Koivisto UAI'06. This takes O(d 2^d) time and space,
so is limited to about 20 variables. This takes about 5 seconds for d=10
to about 5 minutes for d=20.

- Computes edge marginals p(G_{ij}=1|D) (or other posterior
features) approximately using MCMC in the space of DAGs with various
proposal distributions. Options include the standard local proposal
(add/ delete/ reverse edge), and a global proposal based on DP (see
Eaton & Murphy, UAI'07)
It also supports  MCMC in the space of total orders (see Friedman &
Koller, MLJ'03), and Gibbs sampling on the adjacency matrix.
In principle, these algorithms avoid the 2^d bottleneck of exact DP,
although the current implementation may not scale much beyond d=20....

- Supports various models of intervention (perfect, imperfect,
uncertain, soft) for learning causal networks from experimental data;
see Eaton & Murphy, AIStats'07  for details.

- Supports BDe score for Multinomial models with Dirichlet priors,
and BGe score for Gaussian models with Gaussian-Gamma priors.
Could be extended to more flexible CPDs/priors using BIC.

- Supports DBN learning from (fully observed) time series data.

- Supports posterior predictive density modeling for test data,
integrating over structures and parameters. (Also supports plug-in
approximation.)

- Efficiently computes the expected sufficient statistics for discrete
CPDs from large data sets using ADtrees (see Moore & Lee, JAIR'98).


More information about the Connectionists mailing list