paper: Error Correcting Output Codes

Tue Aug 30 19:07:33 EDT 1994

The following paper is available at URL:
"ftp://ftp.cs.orst.edu/pub/tgd/papers/tr-ecoc.ps.gz" 

               Solving Multiclass Learning Problems via
                    Error-Correcting Output Codes

                Thomas G. Dietterich  tgd at cs.orst.edu
                   Department of Computer Science,
                          303 Dearborn Hall
                       Oregon State University
                       Corvallis, OR 97331 USA

                            Ghulum Bakiri
                    Department of Computer Science
                        University of Bahrain
                          Isa Town, Bahrain

    Multiclass learning problems involve finding a definition for an
    unknown function f(x) whose range is a discrete set containing k>2
    values (i.e., k ``classes'').  The definition is acquired by studying
    large collections of training examples of the form <x_i, f(x_i)>.
    Existing approaches to multiclass learning problems include (a) direct
    application of multiclass algorithms such as the decision-tree
    algorithms C4.5 and CART, (b) application of binary concept learning
    algorithms to learn individual binary functions for each of the $k$
    classes, and (c) application of binary concept learning algorithms
    with distributed output representations such as those employed by
    Sejnowski and Rosenberg in the NETtalk system.  This paper compares
    these three approaches to a new technique in which error-correcting
    codes are employed as a distributed output representation.  We show
    that these output representations improve the generalization
    performance of both C4.5 and backpropagation on a wide range of
    multiclass learning tasks.  We also demonstrate that this approach is
    robust with respect to changes in the size of the training sample, the
    assignment of distributed representations to particular classes, and
    the application of overfitting avoidance techniques such as
    decision-tree pruning.  Finally, we show that--like the other
    methods--the error-correcting code technique can provide reliable
    class probability estimates.  Taken together, these results
    demonstrate that error-correcting output codes provide a
    general-purpose method for improving the performance of inductive
    learning programs on multiclass problems.

Thomas G. Dietterich              Voice: 503-737-5559
Department of Computer Science    FAX:   503-737-3014
Dearborn Hall, 303                URL:   http://www.cs.orst.edu/~tgd/index.html
Oregon State University
Corvallis, OR 97331-3102