Mail problems...
Eric Ost
emo at iuvax.cs.indiana.edu
Thu Nov 3 22:32:24 EST 1988
#/bin/cat <<EOM
We were experiencing mailer problems temporarily and the following message
was unable to be delivered during that period of time. Sorry for the
inconvenience.
*** begin message ***
>From Connectionists-Request at q.cs.cmu.edu Thu Nov 3 19:13:36 1988
Received: from C.CS.CMU.EDU by Q.CS.CMU.EDU; 3 Nov 88 15:55:16 EST
Received: from [192.17.174.50] by C.CS.CMU.EDU; 3 Nov 88 15:52:55 EST
Received: from cougar.ccsr.uiuc.edu by uxc.cso.uiuc.edu with SMTP
(5.60+/IDA-1.2.5) id AA01805; Thu, 3 Nov 88 14:52:25 CST
Received: by cougar.ccsr.uiuc.edu (3.2/9.7)
id AA01584; Thu, 3 Nov 88 13:19:20 CST
Date: Thu, 3 Nov 88 13:19:20 CST
From: subutai at cougar.ccsr.uiuc.edu (Subutai Ahmad)
Message-Id: <8811031919.AA01584 at cougar.ccsr.uiuc.edu>
To: connectionists at c.cs.cmu.edu
Subject: Scaling and Generalization in Neural Networks
The following Technical Report is avaiable. For a copy please send
requests to subutai at complex.ccsr.uiuc.edu or:
Subutai Ahmad
Center for Complex Systems Research,
508 S. 6th St.
Champaign, IL 61820
USA
A Study of Scaling and Generalization in Neural Networks
Subutai Ahmad
Technical Report UIUCDCS-R-88-1454
Abstract
The issues of scaling and generalization have emerged as key issues in
current studies of supervised learning from examples in neural
networks. Questions such as how many training patterns and training
cycles are needed for a problem of a given size and difficulty, how to
best represent the input, and how to choose useful training exemplars,
are of considerable theoretical and practical importance. Several
intuitive rules of thumb have been obtained from empirical studies,
although as yet there are few rigorous results. In this paper we
present a careful study of generalization in the simplest possible
case--perceptron networks learning linearly separable functions. The
task chosen was the majority function (i.e. return a 1 if a majority
of the input units are on), a predicate with a number of useful
properties. We find that many aspects of generalization in multilayer
networks learning large, difficult tasks are reproduced in this simple
domain, in which concrete numerical results and even some analytic
understanding can be achieved.
For a network with d input units trained on a set of S random training
patterns, we find that the failure rate, the fraction of misclassified
test instances, falls off exponentially as a function of S. In
addition, with S = alpha d, for fixed values of alpha, our studies
show that the failure rate remains constant independent of d. This
implies that the number of training patterns required to achieve a
given performance level scales linearly with d. We also discuss
various ways in which this performance can be altered, with an
emphasis on the effects of the input representation and the specific
patterns used to train the network. We demonstrate a small change in
the representation that can lead to a jump in the performance level.
We also show that the most useful training instances are the ones
closest to the separating surface. With a training set consisting
only of such ``borderline'' training patterns, the failure rate
decreases faster than exponentially, and for a given training set
size, the performance of the network is significantly better than when
trained with random patterns. Finally, we compare the effects of the
initial state of the network and the training patterns on the final
state.
*** end message ***
More information about the Connectionists
mailing list