Review of Markov chain Monte Carlo methods

Radford Neal radford at cs.toronto.edu
Tue Nov 23 12:05:43 EST 1993


The following review paper is now available via ftp, as described below.


       PROBABILISTIC INFERENCE USING MARKOV CHAIN MONTE CARLO METHODS

                              Radford M. Neal
                     Department of Computer Science 
                          University of Toronto 

                            25 September 1993

    Probabilistic inference is an attractive approach to uncertain
    reasoning and empirical learning in artificial intelligence.
    Computational difficulties arise, however, because probabilistic
    models with the necessary realism and flexibility lead to complex
    distributions over high-dimensional spaces.
    
    Related problems in other fields have been tackled using Monte Carlo
    methods based on sampling using Markov chains, providing a rich array
    of techniques that can be applied to problems in artificial
    intelligence.  The ``Metropolis algorithm'' has been used to solve
    difficult problems in statistical physics for over forty years, and,
    in the last few years, the related method of ``Gibbs sampling'' has
    been applied to problems of statistical inference.  Concurrently, an
    alternative method for solving problems in statistical physics by
    means of dynamical simulation has been developed as well, and has
    recently been unified with the Metropolis algorithm to produce the
    ``hybrid Monte Carlo'' method. In computer science, Markov chain
    sampling is the basis of the heuristic optimization technique of
    ``simulated annealing'', and has recently been used in randomized
    algorithms for approximate counting of large sets.
    
    In this review, I outline the role of probabilistic inference in
    artificial intelligence, present the theory of Markov chains, and
    describe various Markov chain Monte Carlo algorithms, along with a
    number of supporting techniques. I try to present a comprehensive
    picture of the range of methods that have been developed, including
    techniques from the varied literature that have not yet seen wide
    application in artificial intelligence, but which appear relevant. As
    illustrative examples, I use the problems of probabilistic inference
    in expert systems, discovery of latent classes from data, and Bayesian
    learning for neural networks.


                             TABLE OF CONTENTS

    1 Introduction. . . . . . . . . . . . . . .  . . . . . . . . . .   1 

    2 Probabilistic Inference for Artificial Intelligence. . . . . .   4 
      2.1 Probabilistic inference with a fully-specified model . . .   5 
      2.2 Statistical inference for model parameters . . . . . . . .  13 
      2.3 Bayesian model comparison. . . . . . . . . . . . . . . . .  23 
      2.4 Statistical physics. . . . . . . . . . . . . . . . . . . .  25 

    3 Background on the Problem and its Solution . . . . . . . . . .  30 
      3.1 Definition of the problem. . . . . . . . . . . . . . . . .  30 
      3.2 Approaches to solving the problem. . . . . . . . . . . . .  32 
      3.3 Theory of Markov chains . . . . . . . . . . . . .  . . . .  36 

    4 The Metropolis and Gibbs Sampling Algorithms . . . . . . . . .  47 
      4.1 Gibbs sampling . . . . . . . . . . . . . . . . . . . . . .  47 
      4.2 The Metropolis algorithm . . . . . . . . . . . . . . . . .  54 
      4.3 Variations on the Metropolis algorithm . . . . . . . . . .  59 
      4.4 Analysis of the Metropolis and Gibbs sampling algorithms .  64 

    5 The Dynamical and Hybrid Monte Carlo Methods . . . . . . . . .  70 
      5.1 The stochastic dynamics method . . . . . . . . . . . . . .  70 
      5.2 The hybrid Monte Carlo algorithm . . . . . . . . . . . . .  77 
      5.3 Other dynamical methods. . . . . . . . . . . . . . . . . .  81 
      5.4 Analysis of the hybrid Monte Carlo algorithm . . . . . . .  83 

    6 Extensions and Refinements . . . . . . . . . . . . . . . . . .  87 
      6.1 Simulated annealing. . . . . . . . . . . . . . . . . . . .  87 
      6.2 Free energy estimation . . . . . . . . . . . . . . . . . .  94 
      6.3 Error assessment and reduction . . . . . . . . . . . . . . 102 
      6.4 Parallel implementation. . . . . . . . . . . . . . . . . . 114 

    7 Directions for Research. . . . . . . . . . . . . . . . . . . . 116 
      7.1 Improvements in the algorithms . . . . . . . . . . . . . . 116 
      7.2 Scope for applications . . . . . . . . . . . . . . . . . . 118 

    8 Annotated Bibliography . . . . . . . . . . . . . . . . . . . . 121 

                         Total length: 144 pages


The paper may be obtained in PostScript form as follows:

    unix> ftp ftp.cs.toronto.edu
          (log in as user 'anonymous', your e-mail address as password)

    ftp> cd pub/radford
    ftp> binary
    ftp> get review.ps.Z
    ftp> quit

    unix> uncompress review.ps.Z
    unix> lpr review.ps (or however you print PostScript)

The files review[0123].ps.Z in the same directory contain the same paper
in smaller chunks; these may prove useful if your printer cannot digest
the paper all at once.

    Radford Neal

    radford at cs.toronto.edu


More information about the Connectionists mailing list