Preprint Available

Mon Jun 7 11:22:41 EDT 1993

I have placed the preprint whose abstract appears below in the
neuroprose archives.  My thanks to Jordan Pollack for providing this
valuable service to the community.

			   ----------------

	       Fast Exact Multiplication by the Hessian
			 Barak A. Pearlmutter

Just storing the Hessian $H$ (the matrix of second derivatives of the
error $E$ with respect to each pair of weights) of a large neural
network is difficult.  Since a common use of a large matrix like $H$
is to compute its product with various vectors, we derive a technique
that directly calculates $Hv$, where $v$ is an arbitrary vector.  To
calculate $Hv$, we first define a differential operator $R{f(w)} =
(d/dr) f(w+rv) |_{r=0}$, note that $R{dE/dw} = Hv$ and $R{w} = v$,
and then apply $R{}$ to the equations used to compute $dE/dw$.  The
result is an exact and numerically stable procedure for computing
$Hv$, which takes about as much computation, and is about as local, as
a gradient evaluation.  We then apply the technique to a one pass
gradient calculation algorithm (backpropagation), a relaxation
gradient calculation algorithm (recurrent backpropagation), and two
stochastic gradient calculation algorithms (Boltzmann Machines and
weight perturbation).  Finally, we show that this technique can be
used at the heart of many iterative techniques for computing various
properties of $H$, obviating any need to calculate the full Hessian.

[12 pages; 42k; pearlmutter.hessian.ps.Z; To appear in Neural Computation]