Combining classifiers - a study of error reduction

Tue Sep 5 17:59:04 EDT 1995

FTP-host: ftp.ics.uci.edu
FTP-file: pub/machine-learning-papers/others/Ali-TR95-MultDecTrees.ps.Z
          pub/machine-learning-papers/others/Ali-TR95-MultRuleSets.ps.Z

Available by anonymous ftp.

We examine how the error reduction ability (error rate of ensemble divided
by error rate of the single model learned on the same data)
of an ensemble is affected by the degree to which models in the ensemble
make correlated errors. Although the linear relationship which is discovered
between error reduction ability and error correlatedness 
is shown to hold for rule-sets and decision-trees, our on-going research
shows it also holds for neural networks.

================================================================
First paper:

On the Link between Error Correlation and Error Reduction in Decision
                Tree Ensembles

Abstract
Recent work has shown that learning an ensemble consisting of multiple
models and then making classifications by combining the classifications of
the models often leads to more accurate classifications then those based on
a single model learned from the same data.  However, the amount of error
reduction achieved varies from data set to data set.  This paper provides
empirical evidence that there is a linear relationship between the degree of
error reduction and the degree to which patterns of errors made by
individual models are uncorrelated.  Ensemble error rate is most reduced in
ensembles whose constituents make individual errors in a less correlated
manner.  The second result of the work is that some of the greatest error
reductions occur on domains for which many ties in information gain occur
during learning.  The third result is that ensembles consisting of models
that make errors in a dependent but ``negatively correlated'' manner will
have lower ensemble error rates than ensembles whose constituents make
errors in an uncorrelated manner.  Previous work has aimed at learning
models that make errors in a uncorrelated manner rather than those that make
errors in an ``negatively correlated'' manner.  Taken together, these
results help provide an understanding of why the multiple models approach
yields great error reduction in some domains but little in others.

================================================================
Second paper:

Error reduction through learning multiple descriptions

Abstract
Learning multiple descriptions for each class in the data has been shown to
reduce generalization error but the amount of error reduction varies greatly
from domain to domain.  This paper presents a novel empirical analysis that
helps to understand this variation.  Our hypothesis is that the amount of
error reduction is linked to the ``degree to which the descriptions for a
class make errors in a correlated manner.''  We present a precise and novel
definition for this notion and use twenty-nine data sets to show that the
amount of observed error reduction is negatively correlated with the degree
to which the descriptions make errors in an correlated manner.  We
empirically show that it is possible to learn descriptions that make less
correlated errors in domains in which many ties in the search evaluation
measure (e.g. information gain) are experienced during learning.  The paper
also presents results that help to understand when and why multiple
descriptions are a help (irrelevant attributes) and when they are not as
much help (large amounts of class noise).