Paper available: Statistical Tests for Comparing Supervised Classification Learning Algorithms

Tom Dietterich tgd at chert.CS.ORST.EDU
Thu Oct 17 01:16:19 EDT 1996


The following paper is available from
<ftp://ftp.cs.orst.edu/pub/tgd/papers/stats.ps.gz> 

**Hardcopies are not available**

Statistical Tests for Comparing Supervised Classification Learning Algorithms
                         Thomas G. Dietterich
                    Department of Computer Science
                       Oregon State University
                         Corvallis, OR 97331

Abstract:

This paper reviews five statistical tests for determining whether one
learning algorithm out-performs another on a particular learning task.
These tests are compared experimentally to determine their probability
of incorrectly detecting a difference when no difference exists (Type
I error).  Two widely-used statistical tests are shown to have high
probability of Type I error in certain situations and should never be
used.  These tests are (a) a test for the difference of two
proportions and (b) a paired-differences $t$ test based on taking
several random train/test splits.  A third test, a paired-differences
$t$ test based on 10-fold cross-validation, exhibits somewhat elevated
probability of Type I error.  A fourth test, McNemar's test, is shown
to have low Type I error.  The fifth test is a new test, 5x2cv, based
on 5 iterations of 2-fold cross-validation.  Experiments show that
this test also has good Type I error.  The paper also measures the
power (ability to detect algorithm differences when they do exist) of
these tests.  The 5x2cv test is shown to be slightly more powerful
than McNemar's test.  The choice of the best test is determined by the
computational cost of running the learning algorithm.  For algorithms
that can be executed only once, McNemar's test is the only test with
acceptable Type I error.  For algorithms that can be executed ten
times, the 5x2cv test is recommended, because it is slightly more
powerful and because it directly measures variation due to the choice
of training set.

-- 
Thomas G. Dietterich              Voice: 541-737-5559
Department of Computer Science    FAX:   541-737-3014
Dearborn Hall, 303                URL:   http://www.cs.orst.edu/~tgd
Oregon State University
Corvallis, OR 97331-3102



More information about the Connectionists mailing list