paper: Error Correcting Output Codes
Tom Dietterich
tgd at chert.CS.ORST.EDU
Tue Aug 30 19:07:33 EDT 1994
The following paper is available at URL:
"ftp://ftp.cs.orst.edu/pub/tgd/papers/tr-ecoc.ps.gz"
Solving Multiclass Learning Problems via
Error-Correcting Output Codes
Thomas G. Dietterich tgd at cs.orst.edu
Department of Computer Science,
303 Dearborn Hall
Oregon State University
Corvallis, OR 97331 USA
Ghulum Bakiri
Department of Computer Science
University of Bahrain
Isa Town, Bahrain
Multiclass learning problems involve finding a definition for an
unknown function f(x) whose range is a discrete set containing k>2
values (i.e., k ``classes''). The definition is acquired by studying
large collections of training examples of the form <x_i, f(x_i)>.
Existing approaches to multiclass learning problems include (a) direct
application of multiclass algorithms such as the decision-tree
algorithms C4.5 and CART, (b) application of binary concept learning
algorithms to learn individual binary functions for each of the $k$
classes, and (c) application of binary concept learning algorithms
with distributed output representations such as those employed by
Sejnowski and Rosenberg in the NETtalk system. This paper compares
these three approaches to a new technique in which error-correcting
codes are employed as a distributed output representation. We show
that these output representations improve the generalization
performance of both C4.5 and backpropagation on a wide range of
multiclass learning tasks. We also demonstrate that this approach is
robust with respect to changes in the size of the training sample, the
assignment of distributed representations to particular classes, and
the application of overfitting avoidance techniques such as
decision-tree pruning. Finally, we show that--like the other
methods--the error-correcting code technique can provide reliable
class probability estimates. Taken together, these results
demonstrate that error-correcting output codes provide a
general-purpose method for improving the performance of inductive
learning programs on multiclass problems.
Thomas G. Dietterich Voice: 503-737-5559
Department of Computer Science FAX: 503-737-3014
Dearborn Hall, 303 URL: http://www.cs.orst.edu/~tgd/index.html
Oregon State University
Corvallis, OR 97331-3102
More information about the Connectionists
mailing list