Does backprop need the derivative ??
Scott_Fahlman@SEF-PMAX.SLISP.CS.CMU.EDU
Scott_Fahlman at SEF-PMAX.SLISP.CS.CMU.EDU
Sun Feb 7 13:02:42 EST 1993
I happen to know it doesn't work for a more complicated encoder
problem: Image compression. When Paul Munro & I were first doing
image compression back in 86, the error would go down and then
back up! Rumelhart said: "there's a bug in your code" and indeed
there was: we left out the derivative on the hidden units. -g.
I can see why not using the true derivative of the sigmoid, but just an
approximation that preserves the sign, might cause learning to bog down,
but I don't offhand see how it could cause the error to go up, at least in
a net with only one hidden layer and with a monotonic activation function.
I wonder if this problem would also occur in a net using the "sigmoid prime
offset", which adds a small constant to the derivative of the sigmoid. I
haven't seen it.
-- Scott
===========================================================================
Scott E. Fahlman Internet: sef+ at cs.cmu.edu
Senior Research Scientist Phone: 412 268-2575
School of Computer Science Fax: 412 681-5739
Carnegie Mellon University Latitude: 40:26:33 N
5000 Forbes Avenue Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================
More information about the Connectionists
mailing list