Local Minima and XOR

Fri May 5 16:38:15 EDT 1989

I apologize for this rather late reply to a note, I was on vacation.

In a note about searching weight spaces for local minima (I'm afraid I
don't remember the name, it was from UCSD and was part of a NIPS paper)
it was noted that they found local min's in a network for computing
sin().  There was surprise, though, that there weren't any local min's
for the strictly layered 2-in -> 2 -> 1-out XOR net.  Many people have
complained about getting caught in local min with this network ....

It would seem that if you were trying to find a point where the gradient
literally went to zero, that you couldn't find it ... because it's out
at infinity.  That network (when it "works") creates a ridge (or valley)
diagonally through the 2D input space.  If both of the hidden units get
stuck creating the same side of the ridge then only 3 of the 4 points are
correctly classified ... you're in the dasterdly "local min."  But since 
you can still get those three points more accurate (closer to 1 or 0) by 
pushing the arc value twards infinity, you'll always have a non-zero 
gradient.  What you have is a corner of weight space that, once in, will 
not let you simply learn out of.  A learning "black hole," it's something 
like a saddle point in weight space.  By the way, it is therefore 
intuitively obvious why having more hidden units reduces the chances of 
getting stuck -- the more hidden units, the less likely they will *ALL* 
get stuck on the same side of the ridge.

alexis wieland  --  alexis%yummy at gateway.mitre.org