Local Minima and XOR
Alexis Wieland
alexis%yummy at gateway.mitre.org
Fri May 5 16:38:15 EDT 1989
I apologize for this rather late reply to a note, I was on vacation.
In a note about searching weight spaces for local minima (I'm afraid I
don't remember the name, it was from UCSD and was part of a NIPS paper)
it was noted that they found local min's in a network for computing
sin(). There was surprise, though, that there weren't any local min's
for the strictly layered 2-in -> 2 -> 1-out XOR net. Many people have
complained about getting caught in local min with this network ....
It would seem that if you were trying to find a point where the gradient
literally went to zero, that you couldn't find it ... because it's out
at infinity. That network (when it "works") creates a ridge (or valley)
diagonally through the 2D input space. If both of the hidden units get
stuck creating the same side of the ridge then only 3 of the 4 points are
correctly classified ... you're in the dasterdly "local min." But since
you can still get those three points more accurate (closer to 1 or 0) by
pushing the arc value twards infinity, you'll always have a non-zero
gradient. What you have is a corner of weight space that, once in, will
not let you simply learn out of. A learning "black hole," it's something
like a saddle point in weight space. By the way, it is therefore
intuitively obvious why having more hidden units reduces the chances of
getting stuck -- the more hidden units, the less likely they will *ALL*
get stuck on the same side of the ridge.
alexis wieland -- alexis%yummy at gateway.mitre.org
More information about the Connectionists
mailing list