backprop vs Boltzmann

Thu Jul 7 22:49:02 EDT 1988

Regarding the discussion about back propagation vs. the Boltzmann machine:

We agree with Hinton's answer to Lorien Pratt on this question. Since
we have recently completed an extensive comparison between the mean
field theory [1] and the back propagation learning algorithm in a variety
of learning and generalization situations [2] we would like to supplement
his answer with the following remarks:

#1
For the mirror symmetry and clumps problems mean field theory (MFT)
and back propagation (BP) exhibit the same quality with respect to
# of learning cycles needed for learning and generalization percentage.
The two algorithms even tend to find remarkably similar solutions. 
These results hold both for fixed training sets and continuous 
(ongoing) learning.  (These conclusions assume that the two algorithms 
use the same activation function, either [0,1] or [-1,1], and that the 
midpoint is used as the success criterion in generalization.) 

#2
The two algorithms also appear equivalent with respect to different scaling
properties; number of training patterns and number of hidden layers. [In
other words our experience is somewhat different than Hinton's re. the latter
point (which strictly speaking was for the original Boltzmann machine (BM)]

#3
We have made further checks of the MFT approximation than was done in ref.[1]
and find that it is indeed very good.

#4
The real advantage of MFT vs BP is twofold: It allows for a more general
architecture and usage [see #5 below] and it is more natural for a VLSI
implementation. Also, on the simulation level, the parallelism inherent in
MFT can be fully exploited on a SIMD architecture [this is not easy with
the original BM].

#5
BP is of feed-forward nature. Hence there is a very strong distinction
between input and output units. Being bidirectional, no such 
fixed distinction need exist for MFT. Its application areas therefore 
exceed pure feature recognition (input-output mappings).
It can be used as a content-addressable memory. In ref.[2] we are 
exploring this possibility with substantial success.  We use hidden units.
So far we have achieved storage capacities of approximately 2N (where N 
is the number of visible units) for the case of retrieval with clamping 
(the given bits are known with certainty), and of at least roughly N 
for the case of error correction (no bits are known with certainty and 
hence none can be clamped).  We consider these capacities to be lower
bounds, as we are quite confident that we will find still better ways
of using the learning algorithm and the hidden units to further 
increase these capacities.  Also, regarding the discussion a month or so 
ago about fully distributed analog content-addressable memories,  
we have made some investigation in this direction and find that MFT 
is quite capable of learning analog as well as discrete patterns.

1. C. Peterson and J.R. Anderson.
   "A Mean Field Theory Learning Algorithm for Neural Networks",
   MCC Technical Report MCC-EI-259-87, Published in Complex Systems
   1, 995 (1987).

2. C. Peterson and E. Hartman
   "Explorations of the Mean Field Theory Learning Algorithm",
   MCC Technical Report MCC-ACA-ST-064-88
   [This paper is still subject to some polishing and will be
    announced on connectionists within a few weeks time].

          -----------------------------

          Carsten Peterson and Eric Hartman