Hyperparameters: optimise, or integrate out?

Sat Jun 26 12:34:00 EDT 1993

The following preprint is now available by anonymous ftp. 

========================================================================

	Hyperparameters: optimise, or integrate out? 

              David J.C. MacKay

                      University of Cambridge
                      Cavendish Laboratory
                      Madingley Road
                      Cambridge CB3 0HE
                      mackay at mrao.cam.ac.uk

I examine two computational methods for implementation of Bayesian
hierarchical models, that is, models which include unknown
hyperparameters such as regularisation constants. In the `evidence
framework' the model parameters are {\em integrated} over, and the
resulting evidence is {\em maximised} over the hyperparameters. In the
alternative `MAP' method, the `true posterior probability' is found by
{\em integrating} over the hyperparameters, and this is then {\em
maximised} over the model parameters. The similarities of the two
approaches, and their relative merits, are discussed. In severely
ill-posed problems, it is shown that significant biases arise in the
second method.

========================================================================

The preprint "Hyperparameters: optimise, or integrate out?" 
may be obtained as follows: 

ftp 131.111.48.8
anonymous
(your name)
cd pub/mackay
binary
get alpha.ps.Z
quit
uncompress alpha.ps.Z

This document is 16 pages long

Table of contents:

Outline
Making inferences
	The ideal approach
	The Evidence framework
	The MAP method
	The effective $\a$ of the general MAP method
Pros and cons 
	In favour of the MAP method
	Magnifying the differences
	An example
	The curvature of the true prior, and MAP error bars
Discussion

Appendices:
Conditions for the evidence approximation
	Distance between probability distributions
	A method for evaluating distances D( p(t) , q(t) )
	What I mean by saying that the approximation `works'
	Predictions
	The evidence
\sigma_N and \sigma_N-1