Does backprop need the derivative ??
George Bolt
george at psychmips.york.ac.uk
Mon Feb 8 08:32:38 EST 1993
Heini Withagen wrote:
In his paper, 'An Empirical Study of Learning Speed in Back-Propagation
Networks', Scott E. Fahlmann shows that with the encoder/decoder problem
it is possible to replace the derivative of the transfer function by
a constant. I have been able to reproduce this example. However, for
several other examples, it was not possible to get the network
converged using a constant for the derivative.
- end quote -
I've looked at BP learning in MLP's w.r.t. fault tolerance and found
that the derivative of the transfer function is used to *stop* learning.
Once a unit's weights for some particular input (to that unit rather than
the network) are sufficiently developed for it to decide whether to output
0 or 1, then weight changes are approximately zero due to this derivative.
I would imagine that by setting it to a constant, then a MLP will over-
learn certain patterns and be unable to converge to a state of equilibrium,
i.e. all patterns are matched to some degree.
A better route would be to set the derivative function to a constant
over a range [-r,+r], where f[r] -
(sorry) f( |r| ) -> 1.0. To make individual units robust with respect
to weights, make r=c.a where f( |a| ) -> 1.0 and c is a small constant
multiplicative value.
- George Bolt
University of York, U.K.
More information about the Connectionists
mailing list