Quickprop.

Wed Nov 6 18:38:37 EST 1991

My Quickprop algorithm is pretty close to what you describe here, excpet
that it uses only the diagonal terms of the second derivative (i.e. it
pretends that the weight updates do not affect one another).  If you
haven't seen the paper on this, it's in neuroprose as
"fahlman.quickprop-tr.ps.Z" or something close to that.  It works well --
in the few cases I have seen in which both quickprop and conjugate gradient
were used on the same problems, quickprop is considerably faster (though in
very high-dimensional spaces, CG might win).

Yann LeCun has used a slightly different version of the same idea: he
back-propagates second-derivative information for each case, and uses this
to dynamically adjust the learning rate.

- -- Scott Fahlman

Thanks for the info. I'll grab your paper out of Neuroprose and give it a 
read. Have you also done anything on keeping the magnitude of the error
vector constant? Doing this makes a lot of sense to me as it is only the
direction of the next jump in weight space that is important, and in 
particular if one uses delta(w) = - alpha*grad(E) then flat regions
cause very slow progress and steep regions may cause one to move too fast. 
delta(w) = -alpha*grad(E)/||grad(E)|| gives one a lot more control over the
learning rate.

Jon Baxter