Announcement of NIPS Bayesian workshop and associated ftp archive

Tue Nov 5 13:20:59 EST 1991

One of the two day workshops at Vail this year will be:

   `Developments in Bayesian methods for neural networks'
   ------------------------------------------------------
        David MacKay and Steve Nowlan, organizers

   The first day of this workshop will be 50% tutorial in content, 
reviewing some new ways Bayesian methods may be applied to neural
networks. 
   The rest of the workshop will be devoted to discussions of 
the frontiers and challenges facing Bayesian work in neural networks. 
   Participants are encouraged to obtain preprints by anonymous ftp 
before the workshop. Instructions end this message.  
   Discussion will be moderated by John Bridle. 

Day 1, Morning: Tutorial review.

   0   Introduction to Bayesian data modelling.          David MacKay
   1   E-M, clustering and mixtures.                     Steve Nowlan
   2   Bayesian model comparison and determination of 
        regularization constants - 
        application to regression networks.              David MacKay
   3   The use of mixture decay schemes in backprop 
        networks.                                        Steve Nowlan

Day 1, Evening: Tutorial continued.

   4   The `evidence' framework for classification 
        networks.                                        David MacKay

Day 1, Evening: Frontier Discussion.

Background:
   A:
   In many cases the true Bayesian posterior distribution over a hypothesis
   or parameter space is difficult to obtain analytically. Monte Carlo
   methods may provide a useful and computationally efficient way to estimate
   posterior distributions in such cases.

   B:
   There are many applications where training data is expensive to obtain,
   and it is desirable to select training examples so we can learn as much as
   possible from each one. This session will discuss approaches for selecting 
   the next training point "optimally". The same approaches may also be 
   useful for reducing the size of a large data set by omitting the 
   uninformative data points.    

   A   Monte Carlo clustering                           Radford Neal
   B   Data selection / active query learning           Jurgen Schmidhuber
                                                        David MacKay

Day 2, morning discussion:

   C   Prediction of generalisation

Background:
    The Bayesian approach to model comparison evaluates 
    how PROBABLE alternative models are given the data. 
    In contrast, the real problem is often to estimate 
    HOW WELL EACH MODEL IS EXPECTED TO GENERALISE. 
    In this session we will hear about various approaches 
    to predicting generalisation. 
    It is hoped that the discussion will shed light on the questions:
     -   How does Bayesian model comparison relate to generalisation?
     -   Can we predict generalisation ability of one model assuming
            that the `truth' is in a different specified model class?
     -   Is it possible to predict generalisation ability WITHOUT 
            making implicit assumptions about the properties 
            of the `truth'?
     -   Can we interpret GCV (cross-validation) in terms of prediction
            of generalisation?

   1   Prediction of generalisation with `GPE'  John Moody 
   2   Prediction of generalisation 
         - worst + average case analysis        David Haussler + Michael Kearns
   3   News from the statistical physics front  Sara Solla 

Day 2, Evening discussion:
   (Note: There will probably be time in this final session for continued
        discussion from the other sessions.)

   D   Missing inputs, unlabelled data and discriminative training

Background:
       When training a classifier with a data set D_1 = {x,t},
    a full probability model is one which assigns a
    parameterised probability P(x,t|w). However, many classifiers
    only produce a discriminant P(t|x,w), ie they do not model P(x).
    Furthermore, classifiers of the first type often yield better
    discriminative performance if they are trained as if they were only of
    the second type. This is called `discriminative training'. The problem
    with discriminative training is that it leaves us with no obvious
    way to use UNLABELLED data D_2 = {x}. Such data is usually cheap,
    but how can we integrate it with discriminative training?
       The same problem arises for most regression or classifier
    models when some of the input variables are missing from the input
    vector. What is the right thing to do?

   1   Introduction: the problem of combining unlabelled
           data and discriminative training            Steve Renals
   2   Combining labelled and unlabelled data
           for the modem problem                       Steve Nowlan

Reading up before the workshop
------------------------------
People intending to attend this workshop are encouraged to obtain 
preprints of relevant material before NIPS. A selection of preprints 
are available by anonymous ftp, as follows: 

unix> ftp hope.caltech.edu                (or ftp 131.215.4.231)
Name: anonymous
Password: <your name>
ftp> cd pub/mackay
ftp> get README.NIPS
ftp> quit

Then read the file README.NIPS for further information. 
Problems? Contact David MacKay, mackay at hope.caltech.edu, 
or Steve Nowlan, nowlan at helmholtz.sdsc.edu   
---------------------------------------------------------------------------