No subject

Tue Nov 5 14:55:45 EST 1991

    Another idea is to calculate the matrix of second derivatives (grad(grad E)) as
    well as the first derivatives (grad E) and from this information calculate the
    (unique) parabolic surface in weight space that has the same derivatives. Then
    the weights should be updated so as to jump to the center (minimum) of the
    parabola. I haven't coded this idea yet, has anyone else looked at this kind
    of thing, and if so what are the results?

My Quickprop algorithm is pretty close to what you describe here, excpet
that it uses only the diagonal terms of the second derivative (i.e. it
pretends that the weight updates do not affect one another).  If you
haven't seen the paper on this, it's in neuroprose as
"fahlman.quickprop-tr.ps.Z" or something close to that.  It works well --
in the few cases I have seen in which both quickprop and conjugate gradient
were used on the same problems, quickprop is considerably faster (though in
very high-dimensional spaces, CG might win).

Yann LeCun has used a slightly different version of the same idea: he
back-propagates second-derivative information for each case, and uses this
to dynamically adjust the learning rate.

-- Scott Fahlman