two papers: On-Line Gibbs Learning

Wed Dec 24 05:16:20 EST 1997

ftp-host: keter.fiz.huji.ac.il
ftp-filenames: pub/OLGA1.tar and pub/OLGA2.tar

______________________________________________________________________________o

                    On-Line Gibbs Learning I:
                        General Theory

                 H. Sompolinsky and J. W. Kim
      Racah Institute of Physics and Center for Neural Computation,
            Hebrew University, Jerusalem 91904, Israel 
         e-mails: jkim at fiz.huji.ac.il ; haim at fiz.huji.ac.il	

             (Submitted to Physical Review E, Dec 97)

                           ABSTRACT

 We study  a new model of on-line learning, the On-Line Gibbs Algorithm (OLGA)
which is  particularly suitable for supervised learning of realizable and 
unrealizable discrete valued functions. The learning algorithm is based on an
on-line energy function,  $E$, that  balances the need to minimize the 
instantaneous error against the need to minimize the change in the weights.
The relative importance of these terms in $E$  is  determined by a 
parameter $\la$, the inverse of which plays a similar role
as the learning rate parameters in other on-line learning algorithms.  
In  the  stochastic version  of the algorithm, following each presentation
of a labeled example the new weights are chosen from a Gibbs distribution 
with the on-line energy $E$ and a temperature parameter $T$.
In the  zero temperature version of  OLGA,  at each step, the new weights 
are those that minimize the on-line energy $E$.  The  generalization 
performance  of OLGA  is  studied analytically in the limit of small
learning rate, \ie~ $\la \rightarrow \infty$.  It is shown that
at finite temperature OLGA converges to an equilibrium Gibbs distribution
 of weight vectors with an energy function which equals the generalization 
error function.  In its deterministic version OLGA converges to a local
 minimum of the generalization error.  The rate of convergence is studied 
for  the case in which the parameter $\la$ increases  as  a power-law in time.
It is shown that in a generic unrealizable dichotomy, the  asymptotic rate of
decrease of the generalization error is  proportional to the inverse 
square root of  presented examples.  This rate is similar to that  of batch
 learning. The special cases of learning realizable dichotomies or 
dichotomies with output noise are also treated.

The tar postscript file of the paper is placed  on "anonymous ftp" at

ftp-host: keter.fiz.huji.ac.il
ftp-filename: pub/OLGA1.tar

_______________________________________________________________________________o

                    On-Line Gibbs Learning II:
           Application to Perceptron and Multi-layer Networks         

                    J. W. Kim and H. Sompolinsky 
      Racah Institute of Physics and Center for Neural Computation,
            Hebrew University, Jerusalem 91904, Israel 
         e-mails: jkim at fiz.huji.ac.il ; haim at fiz.huji.ac.il	

             (Submitted to Physical Review E, Dec 97)

                           ABSTRACT

In a previous paper (On-line Gibbs Learning I: General Theory)
we have presented the On-line Gibbs Algorithm(OLGA) and studied
analytically its asymptotic convergence.
In this paper we apply OLGA to online supervised learning in
several network architectures: a single-layer perceptron, two-layer
committee machine, and a Winner-Takes-All (WTA) classifier.
The behavior of OLGA for a single-layer perceptron is studied both analytically
and numerically for a variety of rules: a realizable perceptron rule,
a perceptron rule corrupted by output and input noise, and a rule 
generated by a committee machine. The two-layer committee machine is
studied numerically for the cases of learning a realizable rule as well as a
rule that is corrupted by output noise. The WTA network is studied 
numerically for the case of a realizable rule. The asymptotic results 
reported in this paper agree with the predictions of the general theory
of OLGA presented in article I. In all the studied cases, OLGA converges to
a set of weights that minimizes the generalization error.
When the learning rate is chosen as a power law with an optimal power,
OLGA converges with a power-law that is the same as that of batch learning.

The tar postscript file of the paper is placed  on "anonymous ftp" at

ftp-host: keter.fiz.huji.ac.il
ftp-filename: pub/OLGA2.tar