Re. does bp need the derivative?
E. Hartman
hartman%pav.mcc.com at mcc.com
Sat Feb 13 17:36:04 EST 1993
Re. the question of the derivative in backprop, Javier Movellan
and Randy Shimabukuro mentioned the "Manhattan updating" dicussed
in Peterson and Hartman ("Explorations of the Mean Field Theory
Learning Algorithm", Neural Networks Vol.2 pp 475-494 1989).
This technique computes the gradient exactly, but then keeps only
the signs of the components and takes fixed-size weight steps (each
weight is changed by a fixed amount, either up or down).
We used this technique to advantage, both in backprop and mean field
theory nets, on problems with inconsistent data -- data containing
exemplars with identical inputs but differing outputs (one-to-many
mapping). (The problem in the paper was a classification problem
drawn from overlapping gaussian distributions).
The reason that this technique helped on this kind of problem is the
following. Since the data was highly inconsistent, we found that
before taking a step in weight space, it helped to average out the
data inconsistencies by accumulating the gradient over a large number
of patterns (large batch training). But, typically, it happens that
some components of the gradient don't "average out" nicely and instead
became very large. So the components of the gradient vary greatly
in magnitude, which makes choosing a good learning rate difficult.
"Manhattan updating" makes all the components equal in magnitude.
We found it necessary to slowly reduce the step size as training proceeds.
Eric Hartman
More information about the Connectionists
mailing list