We noticed LMS fails to separate

neural!jsd@ihnp4.att.com neural!jsd at ihnp4.att.com
Sun Oct 30 00:08:28 EDT 1988

Yes, we noticed that a Least-Mean-Squares (LMS) network
even with no hidden units fails to separate some problems.
Ben Wittner spoke at the IEEE NIPS meeting in Denver, November
1987, describing TWO failings of this type.

He gave an example of a situation in which LMS algorithms
(including those commonly referred to as back-prop) are
metastable, i.e. they fail to separate the data for certain initial
configurations of the weights.  He went on to describe another case
in which the algorithm actually leaves the solution region after
starting within it.

He also pointed out that this can lead to learning
sessions in which the categorization performance of back-prop
nets (with or without hidden units) is not a monotonically
improving function of learning time.  

Finally, he presented a couple of ways of modifying the algorithm
to get around these problems, and proved a convergence
theorem for the modified algorithms. One of the key ideas
is something that has been mentioned in several recent postings,
namely, to have zero penalty when the training pattern is
well-classified or "beyond".

We cited Minsky & Papert as well as Duda & Hart; we believe 
they were more-or-less aware of these bugs in LMS, although they 
never presented explicit examples of the failure modes.

Here is the abstract of our paper in the proceedings,
_Neural Information Processing Systems -- Natural and Synthetic_,
Denver, Colorado, November 8-12, 1987, Dana Anderson Ed., AIP Press.
We posted the abstract back in January '88, but apparently it
didn't get through to everybody.  Reprints of the whole paper are

   Strategies for Teaching Layered Networks Classification Tasks

	        Ben S. Wittner (1)
		John S. Denker		
	    AT&T Bell Laboratories  			
	    Holmdel, New Jersey 07733

ABSTRACT:  There is a widespread misconception that the delta-rule is
in some sense guaranteed to work on networks without hidden units.  As
previous authors have mentioned, there is no such guarantee for
classification tasks.  We will begin by presenting explicit
counter-examples illustrating two different interesting ways in which
the delta rule can fail.  We go on to provide conditions which do
guarantee that gradient descent will successfully train networks
without hidden units to perform two-category classification tasks.  We
discuss the generalization of our ideas to networks with hidden units
and to multi-category classification tasks.

(1) Currently at NYNEX Science and Technology /  500 Westchester Ave.
White Plains, NY 10604

More information about the Connectionists mailing list