Cross-validation theory
dhw@santafe.edu
dhw at santafe.edu
Sat Aug 13 01:19:30 EDT 1994
Mark Plutowski recently said on connectionist:
>>>
A discussion on the Machine Learning List prompted the question
"Have theoretical conditions been established under which
cross-validation is justified?"
The answer is "Yes."
>>>
As Mark goes on to point out, there are decades
of work on cross-validation from a likelihood-driven
sampling theory perspective. Indeed, Mark's thesis is a
major addition to that literature.
Mark then correctly notes that this literature doesn't *directly*
apply to the discusson on the ML list, since that discussion involves
off-training set rather than iid error.
It should be noted that there is another important distinction between
the framework Mark uses and the implicit framework in the ML list
discussion; the latter has been concerned w/ zero-one loss, whereas
Mark's work concentrates on quadratic loss. The no-free-lunch results
being discussed in the ML list change form drastically if one uses
quadratic rather than zero-one loss. That should not be too surprising,
given, for example, the results in Michael Perrone's work involving
quadratic loss.
It's also worth noting that even in the regime of iid error and quadratic
loss, there is still much to be understood. For example, in the
average-data scenario of sampling theory statistics that Mark uses,
asymptotic properties are better understood than finite data properties.
And in the Bayesian this-data framework, very little is known for any
data-size regime (though some work by Dawid on this subject comes to mind).
David Wolpert
More information about the Connectionists
mailing list