second derivatives

Fri Nov 8 10:02:15 EST 1991

>    Another idea is to calculate the matrix of second derivatives (grad(grad E)) as
>    well as the first derivatives (grad E) and from this information calculate the
>   (unique) parabolic surface in weight space that has the same derivatives. Then
>    the weights should be updated so as to jump to the center (minimum) of the
>    parabola. I haven't coded this idea yet, has anyone else looked at this kind
>    of thing, and if so what are the results?

>-- Scott Fahlman

I don't know about the exact same idea or of the method used by Le Cun, but
Dr. Sholom Weiss of Rutgers University (weiss at cs.rutgers.edu) has developed
an efficient method for calculating the second derivatives using monte-carlo
methods. The second derivatives are then used within a stiff differential
equation solver to optimize the weights by solving the BP differential eqns
directly. the results on different datasets (e.g. peterson-barney's vowel 
dataset, robinson's vowel dataset) are superior to other results not only in
terms of training, but also in terms of generalization.

--nitin