Training XOR with BP

Thu Mar 18 05:09:06 EST 1993

You can find the same type of graphs in (Kolen and Goel, 1991) where we
reported, among other things, results on experiments testing the effects of
the initial weight range on training XOR on 2-2-1 ffwd nets with backprop.
This work was expanded in (Kolen and Pollack, 1990) where we examined the 
boundaries of t-convergent (the network reaches some convergence criteria in
t epochs) regions in weight space.  What we found was that boundary was not
smooth, ie increase t and you get more rings of convergent regions, but
very "cliffy" and small differences in initial weights can mean the
difference between converging in 50 epochs and many more than any of us are
willing to wait for.  

I agree with Luis, "local minima" is overused in the connectionist
community to describe networks which take a VERY long time to converge.  A
good example of this is a 2-2-2-1 network learning XOR which are started with
very small weights selected from a uniform distribution between (-0.1,0.1).
These networks take a long time to learn the target mapping, but not
because of a local minima.  Rather, it's stuck in a very flat region near the
saddle point at all zero weights.

Refs

Kolen, J. F. and Goel, A. K., (1991). Learning in PDP networks:
Computational Complexity and information content.  _IEEE Transactions on
Systems, Man, and Cybernetics_. 21, pg 359-367. (Available through
neuroprose: kolen.pdplearn*)

John. F. Kolen and Jordan. B. Pollack, 1990.  Backpropagation is Sensitive to 
Initial Conditions.  _Complex Systems_. 4:3. pg 269-280.  (Available through
neuroprose: kolen.bpsic*)