second derivatives and the back propagation network
Aaron Owens
owens at eplrx7.es.duPont.com
Tue Nov 12 12:03:25 EST 1991
RE: Second Derivatives and Stiff ODEs for Back Prop Training
Several threads in this newsgroup recently have mentioned the use
of second derivative information (i.e., the Hessian or Jacobian
matrix) and/or stiff ordinary differential equations [ODEs] in
the training of the back propagation network [BPN].
[-- Aside: Stiff differential equation solvers derive
their speed and accuracy by specifically utilizing
the information contained in the second-derivative
Jacobian matrix. -- ]
This is to confirm our experience that training the BPN using
second-derivative methods in general, and stiff ODE solvers in
particular, is extremely fast and efficient for problems which
are small enough (i.e., up to about 1000 connection weights) to
allow the Jacobian matrix [size = (number of weights)**2] to be
stored in the computer's real memory. "Stiff" backprop is
particularly well-suited to real-valued function mappings in
which a high degree of accuracy is required.
We have been using this method successfully in most of our production
applications for several years. See the abtracts below of a paper
presented at the 1989 IJCNN in Washington and of a recently-issued
U. S. patent.
It is possible -- and desirable -- to use the back error propagation
methodology (i.e., the chain rule of calculus) to explicitly
compute the second derivative of the sum_of_squared_prediction_error
with respect to the weights (i.e., the Jacobian matrix) analytically.
Using an analytic Jacobian, rather than computing the second
derivatives numerically [or -- an UNVERIFIED personal hypothesis --
stochastically], increases the algorithm's speed and accuracy
significantly.
-- Aaron --
Aaron J. Owens
Du Pont Neural Network Technology Center
P. O. B. 80357
Wilmington, DE 19880-0357
Telephone Numbers:
Office (302) 695-7341 (Phone & FAX)
Home " 738-5413
Internet: owens at esvax.dnet.dupont.com
---------- IJCNN '89 paper abstract ------------
EFFICIENT TRAINING OF THE BACK PROPAGATION NETWORK BY SOLVING A
SYSTEM OF STIFF ORDINARY DIFFERENTIAL EQUATIONS
A. J. Owens and D. L. Filkin
Central Research and Development
Department P. O. Box 80320
E. I. du Pont de Nemours and Company (Inc.)
Wilmington, DE 19880-0320
International Joint Conference on Neural Networks
June 19-22, 1989, Washington, DC
Volume II, pp. 381-386
Abstract. The training of back propagation networks involves
adjusting the weights between the computing nodes in the artificial
neural network to minimize the errors between the network's
predictions and the known outputs in the training set. This
least-squares minimization problem is conventionally solved by an
iterative fixed-step technique, using gradient descent, which occa-
sionally exhibits instabilities and converges slowly. We show that
the training of the back propagation network can be expressed as a
problem of solving coupled ordinary differential equations for the
weights as a (continuous) function of time. These differential
equations are usually mathematically stiff. The use of a stiff
differential equation solver ensures quick convergence to the
nearest least-squares minimum. Training proceeds at a rapidly
accelerating rate as the accuracy of the predictions increases, in
contrast with gradient descent and conjugate gradient methods. The
number of presentations required for accurate training is reduced
by up to several orders of magnitude over the conventional method.
---------- U. S. Patent No. 5,046,020 abstract ----------
DISTRIBUTED PARALLEL PROCESSING NETWORK WHEREIN THE CONNECTION
WEIGHTS ARE GENERATED USING STIFF DIFFERENTIAL EQUATIONS
Inventor: David L. Filkin
Assignee: E. I. du Pont de Nemours and Company
U. S. Patent Number 5,046,020
Sep. 3, 1991
Abstract. A parallel distributed processing network of the back
propagation type is disclosed in which the weights of connection
between processing elements in the various layers of the network
are determined in accordance with the set of steady solutions of
the stiff differential equations governing the relationship
between the layers of the network.
More information about the Connectionists
mailing list