Mail problems...

Thu Nov 3 22:34:29 EST 1988

#/bin/cat <<EOM
We were experiencing mailer problems temporarily and the following message
was unable to be delivered during that period of time.  Sorry for the
inconvenience.
*** begin message ***
>From Connectionists-Request at q.cs.cmu.edu Thu Nov  3 21:12:43 1988
Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU;  3 Nov 88 15:55:16 EST
Received: from [192.17.174.50] by C.CS.CMU.EDU;  3 Nov 88 15:52:55 EST
Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP
	(5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST
Received: by cougar.ccsr.uiuc.edu (3.2/9.7)
	id AA01584; Thu, 3 Nov 88 13:19:20 CST
Date: Thu, 3 Nov 88 13:19:20 CST
From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad)
Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu>
To: connectionists at c.cs.cmu.edu
Subject: Scaling and Generalization in Neural Networks

The following Technical Report is avaiable.  For a copy please send
requests to subutai at complex.ccsr.uiuc.edu or:
Subutai Ahmad
Center for Complex Systems Research,
508 S. 6th St.
Champaign, IL 61820
USA

A Study of Scaling and Generalization in Neural Networks

Subutai Ahmad

Technical Report UIUCDCS-R-88-1454		    

			Abstract

The issues of scaling and generalization have emerged as key issues in
current  studies of   supervised   learning from  examples in   neural
networks.  Questions such as  how many training  patterns and training
cycles are needed for a problem of a given size and difficulty, how to
best represent the input, and how to choose useful training exemplars,
are   of considerable  theoretical  and practical importance.  Several
intuitive rules of  thumb have been  obtained from empirical  studies,
although  as yet there  are few rigorous  results.   In  this paper we
present  a  careful study of generalization  in  the simplest possible
case--perceptron networks learning linearly  separable functions.  The
task chosen was the majority function (i.e.  return a  1 if a majority
of the  input  units are on),  a   predicate with a   number of useful
properties.  We find that many aspects of generalization in multilayer
networks learning large, difficult tasks are reproduced in this simple
domain,  in which  concrete numerical results   and even some analytic
understanding can be achieved.

For a network with d input units trained on a set of S random training
patterns, we find that the failure rate, the fraction of misclassified
test  instances,  falls off exponentially   as a  function   of  S. In
addition, with S  = alpha d,  for fixed values  of alpha,  our studies
show that  the  failure rate remains  constant independent of d.  This
implies  that the number  of training  patterns  required to achieve a
given performance  level scales   linearly  with d.  We  also  discuss
various   ways in which  this performance    can be  altered, with  an
emphasis on the effects  of  the input representation and the specific
patterns used to train the network.  We demonstrate  a small change in
the representation that can  lead to a jump in  the performance level.
We also  show  that the most  useful  training instances are  the ones
closest  to the separating surface.  With  a   training set consisting
only of  such ``borderline''  training   patterns,  the  failure  rate
decreases  faster  than exponentially, and  for  a  given training set
size, the performance of the network is significantly better than when
trained with random patterns.  Finally,  we compare the effects of the
initial state of  the network and  the training patterns on  the final
state.
*** end   message ***