SVM's and the GACV

Wed Apr 7 19:19:17 EDT 1999

The following paper was the basis for my 
NIPS*98 Large Margin Classifier Workshop talk.
now available as University of Wisconsin-Madison
Statistics Dept TR1006 in 

http://www.stat.wisc.edu/~wahba -> TRLIST
..................................................
Generalized Approximate Cross Validation For 
Support Vector Machines, or, Another Way to 
Look at Margin-Like Quantities. 

Grace Wahba, Yi Lin and Hao Zhang.

             Abstract

We first review the steps connecting the 
Support Vector Machine (SVM) paradigm in 
reproducing kernel Hilbert space, and  
and its connection to the (dual) mathematical 
programming problem traditional in SVM classification
problems. We then review the Generalized Comparative
Kullback-Leibler Distance (GCKL) for the SVM
paradigm and observe that it is trivially a simple 
upper bound on the expected misclassification rate. 
Next we revisit the Generalized Approximate
Cross Validation (GACV) as a computable proxy for
the GCKL, as a function of certain tuning 
parameters in SVM kernels. We have found a justifiable 
(new) approximation for the GACV which is readily computed 
exactly along with the SVM solution to the dual 
mathematical programming problem.  This GACV 
turns out interestingly, but not surprisingly to
be simply related to what several authors 
have identified as the (observed) VC dimension 
of the estimated SVM. Some preliminary simulations 
in a special case 
are suggestive of the fact that the minimizer of
the GACV is in fact a good estimate of the 
minimizer of the GCKL, although further simulation 
and theoretical studies are warranted. It is hoped 
that this preliminary work will lead to better 
understanding of `tuning' issues in the 
optimization of SVM's and related classifiers.
.................................................