thesis available

Tue May 20 15:32:57 EDT 1997

The following Ph.D. thesis is available via anonymous-ftp.

FTP-host:  ftp.uni-wuerzburg.de
FTP-file:  file: pub/dissertation/riegler/these.ps.gz

Dynamics of On-line Learning in Neural Networks

Peter Riegler

Institut fuer Theoretische Physik
Universitaet Wuerzburg
Am Hubland
D-97074 Wuerzburg, Germany

Abstract:

One of the most important features of natural as well as
artificial neural networks is their ability to adjust to
their environment by ``learning''. This results in the
network's ability to ``generalize'', i.e. to generate
with high probability the appropriate response to an unknown
input. The theoretical description of generalization in
artificial neural networks by means of statistical physics
is the subject of this thesis. The focus is on {\em on-line
learning}, where the presentation of examples used in the
learning process occurs in a sequential manner. Hence, the
systems investigated are dynamical in nature. They typically
consist of a large number of degrees of freedom, requiring a
description in terms of order parameters.

In the first part of this work the most fundamental network,
the perceptron, is investigated. Following a recent proposal
by Kinouchi and Caticha it will be shown how one can derive a
learning dynamics starting from first principles that results
in an optimal generalization ability. Results will be presented
for learning processes where the training examples are corrupted
by different types of noise. The resulting generalization ability
will be shown to be comparable to the noiseless case. Furthermore
the results obtained reveal striking similarities to those obtained
for batch learning.

The optimal algorithms derived will be shown to depend on the
characteristics of the particular learning task including the
type and strength of the corrupting noise. In general this
requires an additional estimation of such characteristic
quantities. For the strength of the noise this estimation
leads to interesting dynamical phase transitions.

The second part deals with the dynamical properties of two-layer
neural networks. This is of particular importance since these
networks are known to represent universal approximators. Understanding
the dynamical features will help to construct fast training algorithms
that lead to best generalization.

Specifically, an exact analysis of learning a rule by on-line
gradient descent (backpropagation of error) in a two-layered
neural network will be presented. Hereby, the emphasis is on
adjustable hidden-to-output weights which have been left out
of the analysis in the literature so far. Results are compared
with the training of networks having the same architecture but
fixed weights in the second layer. It will be shown, that certain
features of learning in a two-layered neural network are independent
of the state of the second layer. Motivated by this result it will
be argued that putting the dynamics of the hidden-to-output weights
on a faster time scale will speed up the learning process.

For all systems investigated, simulations confirm the results.

________________________________________________________________

     _/_/_/_/_/_/_/_/_/_/_/_/
        _/            _/        Peter Riegler
       _/   _/_/_/   _/        Institut fuer Theoretische Physik
      _/   _/   _/  _/        Universitaet Wuerzburg
     _/   _/_/_/   _/        Am Hubland
    _/   _/       _/        D-97074 Wuerzburg, Germany
 _/_/_/        _/_/_/

 phone: (++49) (0)931 888-4908
 fax:   (++49) (0)931 888-5141
 email: pr at physik.uni-wuerzburg.de
 www:   http://www.physik.uni-wuerzburg.de/~pr
________________________________________________________________