Preprints on Statistical Mechanics of Learning
nzt@research.att.com
nzt at research.att.com
Sat May 25 09:50:38 EDT 1991
The following preprints are available by ftp from the
neuroprose archive at cheops.cis.ohio-state.edu.
1. Statistical Mechanics of Learning from Examples
I: General Formulation and Annealed Approximation
2. Statistical Mechanics of Learning from Examples
II: Quenched Theory and Unrealizable Rules
by: Sebastian Seung, Haim Sompolinsky, and Naftali Tishby
This is a two part detailed analytical and numerical study of
learning curves in large neural networks, using techniques of
equilibrium statistical mechanics.
Abstract - Part I
Learning from examples in feedforward neural networks is studied using
equilibrium statistical mechanics. Two simple approximations to the
exact quenched theory are presented: the high temperature limit and
the annealed approximation. Within these approximations, we study
four models of perceptron learning of realizable target rules.
In each model, the target rule is perfectly realizable because it is
another perceptron of identical architecture. We focus on the
generalization curve, i.e. the average generalization error as a
function of the number of examples. The case of continuously varying
weights is considered first, for both linear and boolean output units.
In these two models, learning is gradual, with generalization curves
that asymptotically obey inverse power laws. Two other model
perceptrons, with weights that are constrained to be discrete, exhibit
sudden learning. For a linear output, there is a first-order
transition occurring at low temperatures, from a state of poor
generalization to a state of good generalization. Beyond the
transition, the generalization curve decays exponentially to zero.
For a boolean output, the first order transition is to perfect
generalization at all temperatures. Monte Carlo simulations confirm
that these approximate analytical results are quantitatively accurate
at high temperatures and qualitatively correct at low temperatures.
For unrealizable rules the annealed approximation breaks down in
general, as we illustrate with a final model of a linear perceptron
with unrealizable threshold. Finally, we propose a general
classification of generalization curves in models of realizable rules.
Abstract - Part II
Learning from examples in feedforward neural networks is studied using
the replica method. We focus on the generalization curve, which
is defined as the average generalization error as a function of the
number of examples. For smooth networks, i.e. those with
continuously varying weights and smooth transfer functions, the
generalization curve is found to asymptotically obey an inverse power
law. This implies that generalization curves in smooth networks are
generically gradual. In contrast, for discrete networks,
discontinuous learning transitions can occur. We illustrate both
gradual and discontinuous learning with four single-layer perceptron
models. In each model, a perceptron is trained on a perfectly
realizable target rule, i.e. a rule that is generated by another
perceptron of identical architecture. The replica method yields
results that are qualitatively similar to the approximate results
derived in Part I for these models. We study another class of
perceptron models, in which the target rule is unrealizable
because it is generated by a perceptron of mismatched architecture.
In this class of models, the quenched disorder inherent in the random
sampling of the examples plays an important role, yielding
generalization curves that differ from those predicted by the simple
annealed approximation of Part I. In addition this disorder leads to
the appearance of equilibrium spin glass phases, at least at low
temperatures. Unrealizable rules also exhibit the phenomenon of
overtraining, in which training at zero temperature produces inferior
generalization to training at nonzero temperature.
Here's what to do to get the files from neuroprose:
unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get tishby.sst1.ps.Z
ftp> get tishby.sst2.ps.Z
ftp> quit
unix> uncompress tishby.sst*
unix> lpr tishby.sst* (or however you print postscript)
Sebastian Seung
Haim Sompolinsky
Naftali Tishby
----------------------------------------------------------------------------
More information about the Connectionists
mailing list