Does backprop need the derivative ??

Mon Feb 8 08:32:38 EST 1993

Heini Withagen wrote:
In his paper, 'An Empirical Study of Learning Speed in Back-Propagation
Networks', Scott E. Fahlmann shows that with the encoder/decoder problem
it is possible to replace the derivative of the transfer function by
a constant. I have been able to reproduce this example. However, for
several other examples, it was not possible to get the network
converged using a constant for the derivative.

- end quote -

I've looked at BP learning in MLP's w.r.t. fault tolerance and found 
that the derivative of the transfer function is used to *stop* learning.
Once a unit's weights for some particular input (to that unit rather than
the network) are sufficiently developed for it to decide whether to output
0 or 1, then weight changes are approximately zero due to this derivative.
I would imagine that by setting it to a constant, then a MLP will over-
learn certain patterns and be unable to converge to a state of equilibrium,
i.e. all patterns are matched to some degree.

A better route would be to set the derivative function to a constant
over a range [-r,+r], where f[r] -
(sorry) f( |r| ) -> 1.0. To make individual units robust with respect
to weights, make r=c.a where f( |a| ) -> 1.0 and c is a small constant
multiplicative value.

- George Bolt

University of York, U.K.