Back-propogation

Thu Nov 22 21:35:00 EST 1990

> From: IN%"len at mqcomp.mqcs.mq.oz.au"  "Len Hamey" 23-NOV-1990 09:39:15.77

> In studying back-propogation, I find a degree of uncertainty as to exactly
> what the algorithm is.  In particular, we all know and love the parameter
> ETA which specifies how big a step to take, but is the step taken
> simply ETA times the derivative of the error, or is the derivative vector
> normalised so that the step taken is ETA in weight space?  I have

The standard back propagation algorithm uses the gradient descent method
in which the step size is ETA times the gradient.  The addition of the
momentum term reduces some oscillations. However, the convergence
depends very much on the step size (i.e. ETA and momentum).
A small step size gives a smooth but slow learning path and a large
step size gives an oscillatory path.  The oscillatory path sometimes has
a faster convergence but too much oscillation is hazardous to your nets !!
The two updating methods that you mentioned were effectively adopting
two sets of step sizes and this is why the network show different
convergence speed.  In fact, the optimal values of ETA (gradient term) and
ALPHA (momentum term) depend on the training patterns and the size of the
network. Finding these values are an headache.  There are methods that
automatically adapt the step size so as to speed up the
training.  I have worked on the adaptive learning [1] which I found it very
useful and efficient.  Other commonly used training methods which show faster
convergence include the delta-bar-delta method [2]
and the conjugate gradient method (used in optimisation).
A comparison of these methods can be found in [3].

Reference :

[1] L-W. Chan & F. Fallside, "An adaptive training algorithm for back
        propagation networks", Computer Speech and Language, 2, 1987, p205-218.
[2] R.A. Jacobs, "Increased rates of convergence through learning rate
        adaptation", Neural networks, Vol 1, 1988, p295-307.
[3] L-W. Chan, "Efficacy of different learning algorithms of the back
        propagation network", Proceeding of the IEEE Region 10 Conference
        on Computer and Communication Systems (TENCON'90), 1990, Vol 1,
        p23-27.

Lai-Wan Chan,
Computer Science Dept,
The Chinese University of Hong Kong,
Shatin, N.T.,
Hong Kong.

email : lwchan at cucsd.cuhk.hk (bitnet)
tel :   (+852) 695 2795
FAX :   (+852) 603 5024