Test & Derivatives in Backprop

john kolen kolen-j at cis.ohio-state.edu
Tue Feb 9 13:31:43 EST 1993


[I hope that this makes it to connectionists, the last couple of postings
 haven't made it back.  So I have summarized these replies in one message
 for general consumption.]

Regarding the latest talk about derivatives in backprop, I had looked into
replacing the different mathematical operations with other, more
implementation-amenable operations.  This included replacing the
derivative of the squashing function with d(x)=min(x,1-x).  The results of
these tests show that backprop is pretty stable as long as the qualitative
shape of the operations are maintained.  If you replace the derivative with
a constant or linear (wrt activation) function it doesn't work at all for
the learning tasks I considered.  As long as the derivative replacement is
minimal in the extreme activations and maximal at 0.5 (wrt the traditional
sigmoid), the operation will not suffer dramatically.  

After reading Fahlman's observation about loosing bits to noise I had the
following response.  Bits come from binary decisions.  Analog systems
don't do that in normal processing, normally some continuous value affects
another continuous value.  No where do they perform A/D conversion and then
operate on the bits.  If there is no measurement device, then talking about
bits doesn't make sense.

John Kolen



More information about the Connectionists mailing list