No subject
David Wolpert
dhw at santafe.edu
Sun Jun 25 14:51:58 EDT 1995
In a recent posting, Sylvia Gil asks for "pointers to ...
approaches that ... use non-constant weighting functions (to combine
estimators)."
The oldest work in the neural network community on
non-constant combining of estimators, and by far the most thoroughly
researched, is stacking.^1 Stacking is basically the idea of using the
behavior of estimators when trained on part of the training set and
queried on the rest of it to learn how best to combine those
estimators. The original work on stacking was
Wolpert, D. (1992). "Stacked Generalization". Neural Networks, 5, p. 241.
and the earlier tech report (1990) upon which it was based. Other work
on stacking are the papers
Breiman, L. (1992). "Stacked Regression". University of California
Berkeley Statistics Dept., tech. report 367. {I believe this is now in
press in Machine Learning.}
LeBlanc, M, and Tibshirani, R. (1993). "Combining estimates in
regression and classification". University of Toronto Statistics Dept.,
tech. report.
In addition, much in each of the following papers concern
stacking:
Chan, P., and Stolfo, S. (1995). "A Comparative Evaluation of Voting
and Meta-Learning on Partitioned Data". To appear in the Proceedings
of ML 95.
Krough, A, (1995). To appear in NIPS 7, Morgan Kauffman. {I forget the
title, as well as who the other author is.}
Mackay, D. (1993). "Bayesian non-linear modeling for the energy
prediction competition". Cavendish Laboratory, Cambridge University
tech. report.
Zhang, X., Mesirov, J., Waltz, D. (1993). J. Mol. Biol., 232, p. 1227.
Zhang, X., Mesirov, J., Waltz, D. (1992), J. Mol. Biol., 225, p. 1049.
Moroever one of the references Gil mentioned (Hashem's) is
a rediscovery and then investigation of stacking. (Hashem was not
aware of the previous work on stacking when he did his research.)
Finally, a simple variant of the way to use stacking to
improve a single estimator, called "EESA", is the subject of the
following paper
Kim, Bartlett (1995). "Error estimation by series association for
neural network systems". Neural Computation, 7., p. 799.
Two non-stacking references on combining you should probably
read are
Meir, R. (1995). "Bias, variance, and the combination of estimators;
the case of least linear squares". To appear in NIPS 7, Morgan
Kauffman.
Perrone, M. (1993) PhD. thesis, Brown University Physics Dept.
David Wolpert
1 - Actually, there is some earlier work on combining estimators, in
which one does not partition the training set (as in stacking), but
rather uses the residuals (reated by training the estimators on the
full training set)to combine those estimators. However this scheme
consistently performs worse than stacking. See for example the earlier
of the two articles by Zhang et al.
More information about the Connectionists
mailing list