No free lunch for Cross Validation!
cherkaue@cs.wisc.edu
cherkaue at cs.wisc.edu
Fri Dec 15 19:03:15 EST 1995
In reply to Huaiyu Zhu's message <zhuh at helios.aston.ac.uk>
> ...
>
>A little while ago someone claimed that
> Cross validation will benefit from the presence of any structure,
> and if there is no structure it does no harm;
>
> ...
>
>Suppose we have a Gaussian variable x, with mean mu and unit variance.
>We have the following three estimators for estimating mu from a
>sample of size n.
> A: The sample mean. It is optimal both in the sense of Maximum
>Likelihood and Least Mean Squares.
> B: The maximum of sample. It is a bad estimator in any reasonable sense.
> C: Cross validation to choose between A and B, with one extra data point.
>
>The numerical result with n=16 and averaged over 10000 samples, gives
>mean squared error:
> A: 0.0627 B: 3.4418 C: 0.5646
>This clearly shows that cross validation IS harmful in this case,
>despite the fact it is based on a larger sample. NFL still wins!
You forgot
D: Anti-cross validation to choose between A and B, with one extra data
point.
I don't understand your claim that "cross validation IS harmful in this case."
You seem to equate "harmful" with "suboptimal." Cross validation is a technique
we use to guess the answer when we don't already know the answer. You give
technique A the benefit of your prior knowledge of the true answer, but C must
operate without this knowledge. A fair comparison would pit C against D, not C
against A. As you say:
>6. In any of the above cases, "anti cross validation" would be even
>more disastrous.
Kevin Cherkauer
Computer Sciences Dept.
University of Wisconsin-Madison
cherkauer at cs.wisc.edu
More information about the Connectionists
mailing list