A Harder Learning Problem

Fri Aug 5 17:43:30 EDT 1988

I tried standard back-propagation on the spiral problem, and found that it
is a useful addition to the standard set of benchmark problems.  It is as
small and easily stated as the usual encoder and shifter problems, but many
times harder.  

My network has a 2-5-5-5-1 structure, consisting of 2 input units, three
hidden layers of 5 units each, and 1 output unit.  Each layer is connected
to all of the other layers to provide quick pathways along which to
propagate errors.  This network contains 138 weights, which seems about
right for a training set with 194 examples.  

The network was trained with parameters that were increased gradually from
.001 to .002 for the learning rate parameter, and from .5 to .95 for the
momentum parameter.  A few brief excursions to .005 for the learning
parameter caused derailments (the cosine of the angle between successive
steps went negative).  At CMU, we generally use target values of 0.2 and 0.8
in place of 0.0 and 1.0, in order to reduce the need for big weights.   

Assuming that errors occur when the output value for a case lies on the
wrong side of 0.5, the network had the following error history as it was
trained using the batch version of back-propagation (all cases presented
between weight updates.)  This run chewed up about 9 CPU minutes on our
Convex. 

   epochs    errors
    2,000      75
    4,000      74
    6,000      64
    8,000      14  (big improvement here)
   10,000       8
   12,000       4
   14,000       2
   16,000       2  (struggling)
   18,000       0

The average weight at this point is about 3.4.  Since all of the output
values lie on opposite sides of 0.5, it is a simple matter to grow the
weights and separate the values further.  For example, about 1,000 more
epochs are required to pull the output values below 0.4 and above 0.6 for
the two spirals.