Re. does bp need the derivative?

Sat Feb 13 17:36:04 EST 1993

Re. the question of the derivative in backprop, Javier Movellan 
and Randy Shimabukuro mentioned the "Manhattan updating" dicussed 
in Peterson and Hartman ("Explorations of the Mean Field Theory
Learning Algorithm", Neural Networks Vol.2 pp 475-494 1989).

This technique computes the gradient exactly, but then keeps only
the signs of the components and takes fixed-size weight steps (each 
weight is changed by a fixed amount, either up or down).  

We used this technique to advantage, both in backprop and mean field 
theory nets, on problems with inconsistent data -- data containing 
exemplars with identical inputs but differing outputs (one-to-many 
mapping).  (The problem in the paper was a classification problem 
drawn from overlapping gaussian distributions). 

The reason that this technique helped on this kind of problem is the 
following.  Since the data was highly inconsistent, we found that 
before taking a step in weight space, it helped to average out the 
data inconsistencies by accumulating the gradient over a large number 
of patterns (large batch training).  But, typically, it happens that 
some components of the gradient don't "average out" nicely and instead
became very large.  So the components of the gradient vary greatly 
in magnitude, which makes choosing a good learning rate difficult.  
"Manhattan updating" makes all the components equal in magnitude.  

We found it necessary to slowly reduce the step size as training proceeds.

Eric Hartman