Paper on leave-one-out cross validation available

Thu Feb 6 15:25:58 EST 1997

The following paper is now available in compressed postscript
format at http://www.research.att.com/~mkearns under the
"Model Selection/Complexity Regularization" section of my
publications.

Algorithmic Stability and Sanity-Check Bounds
for Leave-One-Out Cross-Validation

Michael Kearns (AT&T Labs) and Dana Ron (MIT)

In this paper we prove "sanity-check" bounds for the error of the 
leave-one-out cross-validation estimate of the generalization error:
that is, bounds showing that the worst-case error of this estimate
is not much worse than that of the training error estimate. The name
sanity-check refers to the fact that although we often expect the
leave-one-out estimate to perform considerably better than the training 
error estimate, we are here only seeking assurance that its performance 
will not be considerably worse. Perhaps surprisingly, such assurance
has been given only for rather limited cases in the prior literature on 
cross-validation.

Any nontrivial bound on the error of leave-one-out must rely on some
notion of algorithmic stability. Previous bounds relied on the rather 
strong notion of hypothesis stability, whose application was primarily 
limited to nearest-neighbor and other local algorithms. Here we introduce 
the new and weaker notion of error stability, and apply it to obtain
sanity-check bounds for leave-one-out for other classes of learning 
algorithms, including training error minimization procedures and Bayesian 
algorithms. We also provide lower bounds demonstrating the necessity of 
error stability for proving bounds on the error of the leave-one-out 
estimate, and the fact that for training error minimization algorithms,
in the worst case such bounds must still depend on the Vapnik-Chervonenkis 
dimension of the hypothesis class.