some questions on training neural nets

Tue Feb 8 10:40:35 EST 1994

On the discussion of cross-validation method, Dr. Plutowski
referred to his paper by writing 

> It proves that two versions of cross-validation
> (one being the "hold-out set" version discussed above, and the other
> being the "delete-1" version) provide unbiased and strongly consistent
> estimates of IMSE  This is statistical jargon meaning that, on
> average, the estimate is accurate, (i.e., the expectation
> of the estimate for given training set size equals the IMSE + a noise term)
> and asymtotically precise (in that as the training set and test set
> size grow large, the estimate converges to the IMSE within the
> constant factor due to noise, with probability 1.)

Comment: 
  This comment is on the above result about "delete-1" version
cross-validation. The  result must have assumed that the
training data set have no outliers (corruption in Y component of a
data point).  Since deleting  a data point that is outlier will cause
a great change in the estimated neural net weights, and also the
squared prediction error on this outliers will be large. This will
then eventually cause a biased estimation of the IMSE.

- ----------------------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912

------- End of Previous Message	  ------

No, actually it turns out that delete-1 cross-validation delivers 
unbiased estimates of IMSE under fairly reasonable conditions.
(More precisely, it delivers estimates of IMSE_N + \sigma^2,
for training set size N and noise variance \sigma^2.) 

Roughly, the noise must have variance the same everywhere in input space,
(or, "homoscedasticity" as the statisticians would say,) with examples
selected independently from the same, fixed environment (i.e., "i.i.d.") 
the expectation of the squared-target must be finite (this just ensures
that conditional expectations of the target and the noise exist everywhere)
plus some conditions on the network to make it behave nicely.  

For these same conditions, the estimate is additionally "conservative," 
in that it does not, (asymptotically, anyway, as N grows large) 
underestimate the expected squared error of the network for optimal weights.

(These results and the prerequisite assumptions are of course 
stated more precisely in the paper.)  

However, we did require an additional assumption to obtain the
"strong" convergence result, in that the optimal weights must be unique.
This is to ensure that the weights for each of the deleted
subsets of N-1 examples converge to the weights obtained by training
on all N examples.

As an aside: This latter condition may seem strong, but it seems to be
(intuitively) applicable to a particular variant of delete-1
cross-validation commonly employed to make its computation more feasible -
(in which case the global optima are in a sense "locally" unique under
the right conditions.) In this variant, the network is trained on
the entire training set to obtain the "base" network.
These weights are then "fine-tuned" upon each of the deleted subsets 
of size N-1 to obtain the N cross-validated weight vectors.
This tends to distribute the fine-tuned weights within a local region
that seens to get tighter as the training set size increases.
It tends to work well in practice, under the right conditions. 
(Essentially, you need to ensure that the ratio of examples to weights
is sufficiently large, and it is easy to detect when this is
not the case.)

A bit off the original subject, I suppose, but I hope these results
help clarify what cross-validation is doing, at least in that
wonderfully ideal place called "asymptopia."  It (apparently) turns out that
these conditions suffice to ensure that the detrimental effect of a
malicious outlier becomes negligible as the size of the training
set grows large, at least with respect to the estimation of this 
particular kind of generalization by cross-validation.

= Mark Plutowski
  UCSD: INC and CS&E

P.S. Thank you for the honorable salutation!  Actually, I am 
(still) just a student here.  8-) 8-|