two new papers on back-prop available from neuroprose
Wray Buntine
wray at ptolemy.arc.nasa.gov
Fri Jul 5 14:19:46 EDT 1991
The following two reports are currently under journal review and
have been made available on the "/pub/neuroprose" archive. Those
unable to access this should send requests to the address below.
Both papers are intended as a guide for the "theoretically-aware
practitioner/algorithm-designer intent on building a better algorithm".
Wray Buntine
NASA Ames Research Center phone: (415) 604 3389
Mail Stop 244-17
Moffett Field, CA, 94035 email: wray at ptolemy.arc.nasa.gov
----------------
Bayesian Back-Propagation
by Wray L. Buntine and Andreas S. Weigend
available as
/pub/neuroprose/buntine.bayes1.ps.Z (pages 1-17)
/pub/neuroprose/buntine.bayes2.ps.Z (pages 1-34)
Connectionist feed-forward networks, trained with back-propagation,
can be used both for non-linear regression and for (discrete
one-of-$C$) classification, depending on the form of training. This
paper works through approximate Bayesian methods to both these
problems. Methods are presented for various statistical components of
back-propagation: choosing the appropriate cost function and
regularizer (interpreted as a prior), eliminating extra weights,
estimating the uncertainty of the remaining weights, predicting for
new patterns (``out-of-sample''), estimating the uncertainty in the
choice of this prediction (``error bars''), estimating the
generalization error, comparing different network structures, and
adjustments for missing values in the training patterns. These
techniques refine and extend some popular heuristic techniques
suggested in the literature, and in most cases require at most a small
additional factor in computation during back-propagation, or
computation once back-propagation has finished. The paper begins with
a comparative discussion of Bayesian and related frameworks for the
training problem.
Contents:
1. Introduction
2. On Bayesian methods
3. Multi-Layer networks
4. Probabilistic neural networks
4.1. Logistic networks
4.2. Cluster networks
4.3. Regression networks
5. Probabilistic analysis
5.1. The network likelihood function
5.2. The sample likelihood
5.3. Prior probability of the weights
5.4. Posterior analysis
6. Analyzing weights
6.1. Cost functions
6.2. Weight evaluation
6.3. Minimum encoding methods
7. Applications to network training
7.1. Weight variance and elimination
7.2. Prediction and generalization error
7.3. Adjustments for missing values
8. Conclusion
-----------------------
Calculating Second Derivatives on Feed-Forward Networks
by Wray L. Buntine and Andreas S. Weigend
available as /pub/neuroprose/buntine.second.ps.Z
Recent techniques for training connectionist feed-forward networks
require the calculation of second derivatives to calculate error bars
for weights and network outputs, and to eliminate weights, etc. This
note describes some exact algorithms for calculating second
derivatives. They require at the worst case approximately $2K$
back/forward-propagation cycles where $K$ is the number of nodes in
the network. For networks with two-hidden layers or less, computation
can be much quicker. Three previous approximations, ignoring some
components of the second derivative, numerical differentiation, and
scoring, are also reviewed and compared.
More information about the Connectionists
mailing list