Paper on leave-one-out cross validation available
Michael J. Kearns
mkearns at research.att.com
Thu Feb 6 15:25:58 EST 1997
The following paper is now available in compressed postscript
format at http://www.research.att.com/~mkearns under the
"Model Selection/Complexity Regularization" section of my
publications.
Algorithmic Stability and Sanity-Check Bounds
for Leave-One-Out Cross-Validation
Michael Kearns (AT&T Labs) and Dana Ron (MIT)
In this paper we prove "sanity-check" bounds for the error of the
leave-one-out cross-validation estimate of the generalization error:
that is, bounds showing that the worst-case error of this estimate
is not much worse than that of the training error estimate. The name
sanity-check refers to the fact that although we often expect the
leave-one-out estimate to perform considerably better than the training
error estimate, we are here only seeking assurance that its performance
will not be considerably worse. Perhaps surprisingly, such assurance
has been given only for rather limited cases in the prior literature on
cross-validation.
Any nontrivial bound on the error of leave-one-out must rely on some
notion of algorithmic stability. Previous bounds relied on the rather
strong notion of hypothesis stability, whose application was primarily
limited to nearest-neighbor and other local algorithms. Here we introduce
the new and weaker notion of error stability, and apply it to obtain
sanity-check bounds for leave-one-out for other classes of learning
algorithms, including training error minimization procedures and Bayesian
algorithms. We also provide lower bounds demonstrating the necessity of
error stability for proving bounds on the error of the leave-one-out
estimate, and the fact that for training error minimization algorithms,
in the worst case such bounds must still depend on the Vapnik-Chervonenkis
dimension of the hypothesis class.
More information about the Connectionists
mailing list