No free lunch for Cross Validation!

cherkaue@cs.wisc.edu cherkaue at cs.wisc.edu
Fri Dec 15 19:03:15 EST 1995


In reply to Huaiyu Zhu's message <zhuh at helios.aston.ac.uk>

> ...
>
>A little while ago someone claimed that 
>    Cross validation will benefit from the presence of any structure,
>    and if there is no structure it does no harm; 
>
> ...
>
>Suppose we have a Gaussian variable x, with mean mu and unit variance.
>We have the following three estimators for estimating mu from a
>sample of size n.
>  A: The sample mean.  It is optimal both in the sense of Maximum
>Likelihood and Least Mean Squares.
>  B: The maximum of sample.  It is a bad estimator in any reasonable sense.
>  C: Cross validation to choose between A and B, with one extra data point.
>
>The numerical result with n=16 and averaged over 10000 samples, gives
>mean squared error:
>        A: 0.0627    B: 3.4418    C: 0.5646
>This clearly shows that cross validation IS harmful in this case,
>despite the fact it is based on a larger sample.  NFL still wins!
 

You forgot

   D: Anti-cross validation to choose between A and B, with one extra data
      point.


I don't understand your claim that "cross validation IS harmful in this case."
You seem to equate "harmful" with "suboptimal." Cross validation is a technique
we use to guess the answer when we don't already know the answer. You give
technique A the benefit of your prior knowledge of the true answer, but C must
operate without this knowledge. A fair comparison would pit C against D, not C
against A. As you say:

>6. In any of the above cases, "anti cross validation" would be even
>more disastrous.

Kevin Cherkauer
Computer Sciences Dept.
University of Wisconsin-Madison
cherkauer at cs.wisc.edu


More information about the Connectionists mailing list