Does backprop need the derivative ??

Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU
Sun Feb 7 13:02:42 EST 1993


    I happen to know it doesn't work for a more complicated encoder 
    problem: Image compression. When Paul Munro & I were first doing
    image compression back in 86, the error would go down and then
    back up! Rumelhart said: "there's a bug in your code" and indeed
    there was: we left out the derivative on the hidden units. -g.

I can see why not using the true derivative of the sigmoid, but just an
approximation that preserves the sign, might cause learning to bog down,
but I don't offhand see how it could cause the error to go up, at least in
a net with only one hidden layer and with a monotonic activation function.

I wonder if this problem would also occur in a net using the "sigmoid prime
offset", which adds a small constant to the derivative of the sigmoid.  I
haven't seen it.

-- Scott

===========================================================================
Scott E. Fahlman			Internet:  sef+ at cs.cmu.edu
Senior Research Scientist		Phone:     412 268-2575
School of Computer Science              Fax:       412 681-5739
Carnegie Mellon University		Latitude:  40:26:33 N
5000 Forbes Avenue			Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================



More information about the Connectionists mailing list