How to ensure that back-propagation separates separable patterns

Thu Oct 27 15:39:44 EDT 1988

We have recently obtained the following results, and would like to know if
anyone has seen them previously written up.  The description is informal but
accurate:

There has been some interest in understanding the behavior of backpropagation
in feedforward nets with no hidden neurons.  This is a particularly simple
case that does not require the full power of back-propagation (it can also be
approached from the point of view of perceptrons), but we believe that it
provides useful insights into the structure of the error surface, and that
understanding this case is a prerequisite for understanding the more general
case with hidden layers.

In [1], the authors give examples illustrating the fact that while a training
set may be separable, a net performing backpropagation (gradient) search may
get stuck in a solution which fails to separate.  The objective of this note
is to point out that if one uses instead a threshold procedure, where we do
not penalize values ``beyond'' the targets, then such counterexamples cease to
exist, and in fact that one has a convergence theorem that closely parallels
that for perceptrons: the continuous gradient adjustment procedure is such
that, from any initial weight configuration, a separating set of weights is
obtained in finite time.

We also show how to modify the example given in [2] to conclude that there
is a training set consisting of 125 binary vectors and a network configuration
for which there are nonglobal local minima, even if threshold LMS is used.
In this example, the training set is of course not separable.

Finally, we compare our results to more classical pattern recognition
theorems, in particular to the convergence of the relaxation procedure for
threshold LMS with linear output units ([3]), and show that the continuous
adjustment approach is essential in the study of the nonlinear case.
Another essential difference with the linear case is that in the latter
nonglobal local minima cannot happen even if the data is not separable.

References:

[1]  Brady, M., R. Raghavan and J. Slawny, ``Backpropagation fails to separate
where perceptrons succeed,'' submitted for publication.  Summarized version in
``Gradient descent fails to separate,''   in {\it Proc. IEEE International
Conference on Neural Networks}, San Diego, California, July 1988, Vol. I,
pp.649-656.

[2]  Sontag, E.D., and H.J. Sussmann, ``Backpropagation can give rise to
spurious local minima even for networks without hidden layers,'' submitted.

[3] Duda, R.O., and P.E. Hart, {\it Pattern Classificationa and Scene
Analysis}, Wiley, New York, 1973.

Geoff Hinton
Eduardo Sontag
Hector Sussman