Backprop Feedback Gain
Larry Fast
72247.2225 at CompuServe.COM
Mon Oct 21 23:05:00 EDT 1991
I'm expanding the PDP Backprop program (McClelland&Rumlhart version 1.1) to
compensate for the following problem:
As Backprop passes the error back thru multiple layers, the gradient has
a built in tendency to decay. At the output the maximum slope of
the 1/( 1 + e(-sum)) activation function is 0.5.
Each successive layer multiplies this slope by a maximum of 0.5.
The maximum gains at various layers (where n is the output layer) is:
max slope at layer n = 0.5
max slope at layer n-2 = 0.125
max slope at layer n-3 = 0.0625
max slope at layer n-4 = 0.03125 ....
It has been suggested (by a couple of sources) that an attempt should be
made to have each layer learn at the same rate. To this end, I'm installing
a gain factor on error being backpropagated.
The new error function is: errorPropGain * act * (1 - act)
The nominal value that makes sense is 2 (or more). This would allow at least
the maximum learning rate to propagate unattenuated.
Has anyone else tried this, or any other method of flattening out the learning
rate in deep layers. Any info regarding more recent releases of PDP or
a users' group would also be helpful.
Please respond directly to 72247.2225 at compuserve.com
Thanks, Larry Fast
More information about the Connectionists
mailing list