Optimal Stopping Time paper

Fri Feb 18 21:31:24 EST 1994

***Do not forward to other bboards***
FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/wang.optistop.ps.Z

The file wang.optistop.ps.Z is now available for
copying from the Neuroprose repository:

Optimal Stopping and Effective Machine Complexity in Learning

    Changfeng Wang			U.Penn
    Santosh S. Venkatesh	U.Penn
    J. Stephen Judd			Siemens

Abstract:
We study the problem of when to stop training a class of feedforward networks
-- networks with fixed input weights, one hidden layer, and a linear output --
when they are trained with a gradient descent algorithm on a finite number
of examples. Under general regularity conditions, it is shown analytically 
that there are, in general, three distinct phases in the generalization 
performance in the learning process. In particular, the network has better 
generalization performance when learning is stopped at a certain time before 
the global minimum of the empirical error is reached. A notion of "effective 
size" of a machine is defined and used to explain the trade-off between the 
complexity of the machine and the training error in the learning process.

The study leads naturally to a network size selection criterion,
which turns out to be a generalization of Akaike's Information Criterion
for the learning process.
It is shown that stopping learning before the global minimum of the
empirical error has the effect of network size selection.

(8 pages)    To appear in NIPS-6-  (1993)

sj
        Stephen Judd				Siemens Corporate Research,
	(609) 734-6573				755 College Rd. East,
	fax (609) 734-6565			Princeton,
	judd at learning.scr.siemens.com		NJ  usa 08540