some questions on training neural nets...

Wed Feb 2 13:02:30 EST 1994

In answer to the following:

   From: "Charles X. Ling" <ling at csd.uwo.ca>
   Date: Tue, 1 Feb 94 03:37:10 EST

   Hi neural net experts,

   Will cross-validation help ? [...]
   (could results on the validation set be coincident)?

Tom Dietterich replies:

			[stuff deleted]

There are many ways to manage the bias/variance tradeoff.  I would say
that there is nothing approaching complete agreement on the best
approaches (and more fundamentally, the best approach varies from one
application to another, since this is really a form of prior).  The
approaches can be summarized as

* early stopping
* error function penalties
* size optimization
  - growing
  - pruning
  - other

Early stopping usually employs cross-validation to decide when to stop
training.  (see below).  In my experience, training an overlarge
network with early stopping gives better performance than trying to
find the minimum network size.  It has the disadvantage that training
costs are very high.

			[stuff deleted]

   3. If, for some reason, cross-validation is needed, and TR is split to
   TR1 (for training) and TR2 (for validation), what would be the proper ways
   to do cross-validation? Training on TR1 uses only partial information in 
   TR, but training TR1 to find right parameters and then training on TR1+TR2 
   may require parameters different from the estimation of training TR1. 

I use the TR1+TR2 approach.  On large data sets, this works well.  On
small data sets, the cross-validation estimates themselves are very
noisy, so I have not found it to be as successful.  I compute the
stopping point using the sum squared error per training example, so
that it scales.  I think it is an open research problem to know
whether this is the right thing to do.  [the reply continues..]

------- End of Forwarded Message

In response to the last point, I supply a reference that provides theoretical
guidance from a statistical perspective.  It proves that cross-validation
estimates Integrated Mean Squared Error (IMSE) within a constant due to
noise.

			What this means:  

IMSE is a version of the mean squared error that accounts for the 
finite size of the training set.  Think of it as the expected squared
error obtained by training a network on random training sets of a 
particular size.   It is an ideal (i.e., in general, unobservable)
measure of generalization.

IMSE embodies the bias and variance tradeoff.   It can be decomposed into
the sum of two terms, which directly quantify the bias + variance. 
Therefore, if IMSE embodies the measure
of generalization that is relevant to you, (which will depend on your
learning task) then, least-squares cross-validation provides a realizable
estimate of generalization. 

		Summary of the main results of the paper:

It proves that two versions of cross-validation
(one being the "hold-out set" version discussed above, and the other
being the "delete-1" version) provide unbiased and strongly consistent
estimates of IMSE  This is statistical jargon meaning that, on
average, the estimate is accurate, (i.e., the expectation
of the estimate for given training set size equals the IMSE + a noise term)
and asymtotically precise (in that as the training set and test set
size grow large, the estimate converges to the IMSE within the
constant factor due to noise, with probability 1.)

Note that it does not say anything about the rate at which the
variance of the estimate converges to the truth; therefore, it is
possible that other IMSE-approximate measures may excel for small
training set sizes (e.g., resampling methods such as bootstrap and
jackknife.)   However, it is the first result generally applicable
to nonlinear regression that the authors are aware of, extending
the well-known (in the statistical and econometric literature)
work by C.J. Stone and others that prove similar results for particular
learning tasks or for particular models.

The statement of the results will appear in NIPS 6.  I will post
the soon-to-be-completed extended version to Neuroprose if anyone 
wants to see it sooner, or need access to the proofs.

I hope this is helpful,

= Mark Plutowski
  Institute for Neural Computation,
  and Department of Computer Science and Engineering
  University of California, San Diego
  La Jolla, California.  USA.

Here is the reference:

Plutowski, Mark~E., Shinichi Sakata, and Halbert White. (1994).
``Cross-validation estimates IMSE.''
Cowan, J.D., Tesauro, G., and Alspector, J. (eds.),
{\em Advances in Neural Information Processing Systems 6},
San Mateo, CA: Morgan Kaufmann Publishers.