Bias + variance for classification

Ronny Kohavi ronnyk at starry.engr.sgi.com
Sun Apr 21 02:22:50 EDT 1996



The following paper will appear in the Proceedings of the Thirteenth
International Conference on Machine Learning, 1996.  It is available
at: http://reality.sgi.com/ronnyk under publications (with some slides
containing more results) or by anon ftp to 
ftp://starry.stanford.edu/pub/ronnyk/biasVar.ps

There have been some recent announcements of tech-reports for
bias-variance decompositions in classification domains (0-1 loss).
In our paper we address the desiderata for good bias-variance
decompositions and show some problems with other decompositions.
We also address an important issue related to the naive estimation of
these quantities using frequency counts and offer a correction.



     Bias Plus Variance Decomposition for Zero-One Loss Functions

                Ron Kohavi                  David H. Wolpert
       Data Mining and Visualization     
          Silicon Graphics, Inc.         The Santa Fe Institute 
             ronnyk at sgi.com                  dhw at santafe.edu

    

We present a bias-variance decomposition of expected misclassification
rate, the most commonly used loss function in supervised
classification learning.  The bias-variance decomposition for
quadratic loss functions is well known and serves as an important tool
for analyzing learning algorithms, yet no decomposition was offered
for the more commonly used zero-one (misclassification) loss functions
until the recent work of Kong & Dietterich [1995] and Breiman [1996].
Their decomposition suffers from some major shortcomings though (.e.g,
potentially negative variance), which our decomposition avoids.  We
show that, in practice, the naive frequency-based estimation of the
decomposition terms is by itself biased and show how to correct for
this bias. We illustrate the decomposition on various algorithms and
datasets from the UCI repository.


--

   Ronny Kohavi (ronnyk at sgi.com)




More information about the Connectionists mailing list