Preprints on Statistical Mechanics of Learning

nzt@research.att.com nzt at research.att.com
Sat May 25 09:50:38 EDT 1991


The following preprints are available by ftp from the
neuroprose archive at cheops.cis.ohio-state.edu.

1.    Statistical Mechanics of Learning from Examples 
     I: General Formulation and Annealed Approximation


2.    Statistical Mechanics of Learning from Examples 
        II: Quenched Theory and Unrealizable Rules

by:  Sebastian Seung,  Haim Sompolinsky, and  Naftali Tishby


This is a two part detailed analytical and numerical study of 
learning curves in large neural networks, using techniques of 
equilibrium statistical mechanics.



                          Abstract - Part I
  Learning from examples in feedforward neural networks is studied using
  equilibrium statistical mechanics.  Two simple approximations to the
  exact quenched theory are presented: the high temperature limit and
  the annealed approximation.  Within these approximations, we study
  four models of perceptron learning of   realizable   target rules.
  In each model, the target rule is perfectly realizable because it is
  another perceptron of identical architecture.  We focus on the   
  generalization curve,   i.e. the average generalization error as a
  function of the number of examples.  The case of continuously varying
  weights is considered first, for both linear and boolean output units.
  In these two models, learning is gradual, with generalization curves
  that asymptotically obey inverse power laws.  Two other model
  perceptrons, with weights that are constrained to be discrete, exhibit
  sudden  learning.  For a linear output, there is a first-order
  transition occurring at low temperatures, from a state of poor
  generalization to a state of good generalization.  Beyond the
  transition, the generalization curve decays exponentially to zero.
  For a boolean output, the first order transition is to  perfect
  generalization at all temperatures.  Monte Carlo simulations confirm
  that these approximate analytical results are quantitatively accurate
  at high temperatures and qualitatively correct at low temperatures.
  For unrealizable rules the annealed approximation breaks down in
  general, as we illustrate with a final model of a linear perceptron
  with unrealizable threshold.  Finally, we propose a general
  classification of generalization curves in models of realizable rules.

                          Abstract - Part II
  Learning from examples in feedforward neural networks is studied using
  the replica method.  We focus on the  generalization curve, which
  is defined as the average generalization error as a function of the
  number of examples.  For smooth networks, i.e.  those with
  continuously varying weights and smooth transfer functions, the
  generalization curve is found to asymptotically obey an inverse power
  law.  This implies that generalization curves in smooth networks are
  generically gradual.  In contrast, for discrete networks,
  discontinuous learning transitions can occur.  We illustrate both
  gradual and discontinuous learning with four single-layer perceptron
  models.  In each model, a perceptron is trained on a   perfectly
  realizable target rule, i.e. a rule that is generated by another
  perceptron of identical architecture.  The replica method yields
  results that are qualitatively similar to the approximate results
  derived in Part I for these models.  We study another class of
  perceptron models, in which the target rule is    unrealizable
  because it is generated by a perceptron of mismatched architecture.
  In this class of models, the quenched disorder inherent in the random
  sampling of the examples plays an important role, yielding
  generalization curves that differ from those predicted by the simple
  annealed approximation of Part I.  In addition this disorder leads to
  the appearance of equilibrium spin glass phases, at least at low
  temperatures.  Unrealizable rules also exhibit the phenomenon of 
  overtraining,  in which training at zero temperature produces inferior
  generalization to training at nonzero temperature.


Here's what to do to get the files from neuroprose:

              unix> ftp cheops.cis.ohio-state.edu (or 128.146.8.62)
              Name: anonymous
              Password: neuron
              ftp> cd pub/neuroprose
              ftp> binary
              ftp> get tishby.sst1.ps.Z
              ftp> get tishby.sst2.ps.Z
              ftp> quit
              unix> uncompress tishby.sst*
              unix> lpr tishby.sst* (or however you print postscript)

Sebastian Seung
Haim Sompolinsky
Naftali Tishby
----------------------------------------------------------------------------



More information about the Connectionists mailing list