some questions on training neural nets

Wed Feb 9 14:42:14 EST 1994

Plutowski (Tue, 08 Feb 1994) wrote

 >No, actually it turns out that delete-1 cross-validation delivers 
 >unbiased estimates of IMSE under fairly reasonable conditions.
 >(More precisely, it delivers estimates of IMSE_N + \sigma^2,
 >for training set size N and noise variance \sigma^2.) 

 >Roughly, the noise must have variance the same everywhere in input space,
 >(or, "homoscedasticity" as the statisticians would say,) with examples
 >selected independently from the same, fixed environment (i.e., "i.i.d.") 
 >the expectation of the squared-target must be finite (this just ensures
 >that conditional expectations of the target and the noise exist everywhere)
 >plus some conditions on the network to make it behave nicely.  

 >For these same conditions, the estimate is additionally "conservative," 
 >in that it does not, (asymptotically, anyway, as N grows large) 
 >underestimate the expected squared error of the network for optimal weights.

Outliers are the data points that come in an "unexpected" way, both
in the training  data and in the future. For example, the data is collected
so that a proportional  of them are typos. So as the size of the data gets
large, the number of outliers in them also gets large. Plutowski's
assumption, as I understand it, is to assume the ratio of the number outliers
over the size of data size is very small. 

One way to look at data set containing outliers is to assume noises
are inhomoscedastic. Outlier data points have their noises with large variance,
and good data points have their noises with small variance (Liu 1994).
This is different from Plutowski's   "homoscedasticity" assumption.
Since we have no intention of  predicting the value of outliers, 
robust estimation in both the parameters and the generalization error
requires the "removal" of the outliers.

These discussion, I hope, could convey the idea that when using
cross-validation for the estimation of generalization error, 
some cautions should be taken as regards to the 
influence of Bad data in the training data set. 

------------
Yong Liu
Box 1843
Department of Physics
Institute for Brain and Neural Systems
Brown University
Providence, RI 02912