Optimal Stopping Time paper
Stephen Judd
judd at scr.siemens.com
Fri Feb 18 21:31:24 EST 1994
***Do not forward to other bboards***
FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/wang.optistop.ps.Z
The file wang.optistop.ps.Z is now available for
copying from the Neuroprose repository:
Optimal Stopping and Effective Machine Complexity in Learning
Changfeng Wang U.Penn
Santosh S. Venkatesh U.Penn
J. Stephen Judd Siemens
Abstract:
We study the problem of when to stop training a class of feedforward networks
-- networks with fixed input weights, one hidden layer, and a linear output --
when they are trained with a gradient descent algorithm on a finite number
of examples. Under general regularity conditions, it is shown analytically
that there are, in general, three distinct phases in the generalization
performance in the learning process. In particular, the network has better
generalization performance when learning is stopped at a certain time before
the global minimum of the empirical error is reached. A notion of "effective
size" of a machine is defined and used to explain the trade-off between the
complexity of the machine and the training error in the learning process.
The study leads naturally to a network size selection criterion,
which turns out to be a generalization of Akaike's Information Criterion
for the learning process.
It is shown that stopping learning before the global minimum of the
empirical error has the effect of network size selection.
(8 pages) To appear in NIPS-6- (1993)
sj
Stephen Judd Siemens Corporate Research,
(609) 734-6573 755 College Rd. East,
fax (609) 734-6565 Princeton,
judd at learning.scr.siemens.com NJ usa 08540
More information about the Connectionists
mailing list