Summary of "Does backprop need the derivative ??"

Fri Feb 12 13:00:08 EST 1993

Congratulations on initiating a very lively discussion. From reading the
responses though, it appears that people are interpreting your question
differently. At the risk of adding to the confusion let me try to
explain.

It seems that some people are talking about the derivative of the
transfer function (F') and while others are talking about the gradient
of the error function. We have looked at both cases:

We approximate F' in a manner similar to that suggested by George Bolt.
Letting F'(|x|) -> 1 for |x|<r, and F'(|x|) -> a for |x|>=r. Where a is
a small positive constant, and r is a point where F'(r) is approximately
1.

We have also, in a sense, approximated the gradient of the error
function by quantizing the weight updates. This is similar to what
Peterson and Hartman call "Manhattan updating". In this case it is
important to preserve the sign of the derivative.

We have found that the first type of approximation has very little
effect of back propagation. Depending on the problem, the second type
sometimes shortens the learning time and sometimes prevents the network
from learning. In some cases it helps to decrease the size of the
updates as learning progresses.

                        Randy Shimabukuro