Does backprop need the derivative ?? 
    meng@spring.kuee.kyoto-u.ac.jp 
    meng at spring.kuee.kyoto-u.ac.jp
       
    Wed Feb 10 11:58:19 EST 1993
    
    
  
Thinking about it, it seems that the derivative always can be replaced
by a sufficiently small constant. I.e., for a certain training set and
a certain requirement of precision on the ouput units, you can find a
constant that is smaller than a certain constant that, with the same
starting point, will find the same minimum for the same network as an
algorithm that is using the derivative. The problem with this of
course is that the constant may be so small that the training time
may be prohibitive, while the motivation to such a constant is to speed up
training. The reason that this works in a lot of instances is,
I think, that the requirement of precision is wide enough to let the
network jump into a region that is sufficiently close to a minimum.
A situation where it wouldn't work, would be a situation where the
network is moving in the right direction, but jumping too far, i.e.
jumping from one side of a valley to the other alternately, never landing
within a region that would give convergence within the requirements set.
The use of the derivative solves this by getting smaller when approaching
a minimum.
Another possibility is that using a constant the network might settle
in another minimum (or try to settle in another ("wider") minimum) by
virtue of "seeing" the error surface as more coarse grained than the
version using a derivative. In some cases, if you're lucky (i.e. has
a good initial state in relation to a minimum and the constant you're
using) you might hit bull's eye, with another initial state you might be
oscillating around the solution (i.e. having the error go up and down
without getting within the required limit). In such a case you could switch
to using the derivative or simply decrease the constant (maybe how much
could be computed on the basis of the increase in error? Just an idea).
These are just some thoughts on the subject, no empirical study undertaken.
Tore
    
    
More information about the Connectionists
mailing list