Paper on boosting.
Jerome H. Friedman
jhf at stat.Stanford.EDU
Thu Jul 23 19:26:34 EDT 1998
*** Technical Report Available ***
Additive Logistic Regression: a Statistical View of Boosting
Jerome Friedman
(jhf at stat.stanford.edu)
Trevor Hastie
(trevor at stat.stanford.edu)
Robert Tibshirani
(tibs at utstat.toronto.edu)
ABSTRACT
Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of
the most important recent developments in classification
methodology. The performance of many classification algorithms often
can be dramatically improved by sequentially applying them to
reweighted versions of the input data, and taking a weighted majority
vote of the sequence of classifiers thereby produced. We show that
this seemingly mysterious phenomenon can be understood in terms of
well known statistical principles, namely additive modeling and
maximum likelihood. For the two-class problem, boosting can be viewed
as an approximation to additive modeling on the logistic scale using
maximum Bernoulli likelihood as a criterion. We develop more direct
approximations and show that they exhibit nearly identical results to
that of boosting. Direct multi-class generalizations based on
multinomial likelihood are derived that exhibit performance comparable
to other recently proposed multi-class generalizations of boosting in
most situations, and far superior in some. We suggest a minor
modification to boosting that can reduce computation, often by factors
of 10 to 50. Finally, we apply these insights to produce an
alternative formulation of boosting decision trees. This approach,
based on best-first truncated tree induction, often leads to better
performance, and can provide interpretable descriptions of the
aggregate decision rule. It is also much faster computationally making
it more suitable to large scale data mining applications.
Available by ftp from:
"ftp://stat.stanford.edu/pub/friedman/boost.ps.Z"
or "ftp://utstat.toronto.edu/pub/tibs/boost.ps.Z"
Comments welcomed.
More information about the Connectionists
mailing list