Back-propogation

Thu Nov 22 10:35:18 EST 1990

In studying back-propogation, I find a degree of uncertainty as to exactly
what the algorithm is.  In particular, we all know and love the parameter
ETA which specifies how big a step to take, but is the step taken
simply ETA times the derivative of the error, or is the derivative vector
normalised so that the step taken is ETA in weight space?  I have
experimented with both of these variations and observe that normalising
the size of the step taken produces faster convergence on parity and XOR
problems, but can also introduce oscillation.  SUrprisingly, I have not
been able to observe oscillatory behaviour when using un-normalised
derivative steps.  (I refer to oscillation of the solution across a
narrow valley in the error surface).  I assume then that proponents of
momentum (which stems back at least to Rumelhart et al) have all been
normalising the derivative vector to achieve fixed-size steps in
weight space.  Is this assumption correct?  If not, then can somebody
point me to a problem that generates oscillatory behaviour (and please
indicate the value of ETA also).

			Len Hamey
			len at mqcomp.mqcs.mq.oz.au