Does backprop need the derivative ??

Sun Feb 7 13:02:42 EST 1993

    I happen to know it doesn't work for a more complicated encoder 
    problem: Image compression. When Paul Munro & I were first doing
    image compression back in 86, the error would go down and then
    back up! Rumelhart said: "there's a bug in your code" and indeed
    there was: we left out the derivative on the hidden units. -g.

I can see why not using the true derivative of the sigmoid, but just an
approximation that preserves the sign, might cause learning to bog down,
but I don't offhand see how it could cause the error to go up, at least in
a net with only one hidden layer and with a monotonic activation function.

I wonder if this problem would also occur in a net using the "sigmoid prime
offset", which adds a small constant to the derivative of the sigmoid.  I
haven't seen it.

-- Scott

===========================================================================
Scott E. Fahlman			Internet:  sef+ at cs.cmu.edu
Senior Research Scientist		Phone:     412 268-2575
School of Computer Science              Fax:       412 681-5739
Carnegie Mellon University		Latitude:  40:26:33 N
5000 Forbes Avenue			Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================