New posting

David Wolpert dhw at santafe.edu
Mon May 11 17:15:17 EDT 1992



****************** DO NOT FORWARD TO OTHER LISTS **********


The following article has been placed in neuroprose under the name
"wolpert.stack_gen.ps.Z". It appears in the current issue of Neural
Networks. It is a rather major rewrite of a preprint of the same name.


STACKED GENERALIZATION

by David H. Wolpert

Abstract: This paper introduces stacked generalization, a scheme for
minimizing the generalization error rate of one or more generalizers.
Stacked generalization works by deducing the biases of the generalizer(s)
with respect to a provided learning set. This deduction proceeds by
generalizing in a second space whose inputs are (for example) the guesses
of the original generalizers when taught with part of the learning set and
trying to guess the rest of it, and whose output is (for example) the correct
guess. When used with multiple generalizers, stacked generalization can be
seen as a more sophisticated version of cross-validation, exploiting a strategy
more sophisticated than cross-validation's crude winner-takes-all for
combining the individual generalizers. When used with a single generalizer,
stacked generalization is a scheme for estimating (and then correcting for)
the error of a generalizer which has been trained on a particular learning
set and then asked a particular question. After introducing stacked
generalization and justifying its use, this paper presents two numerical
experiments. The first demonstrates how stacked generalization improves
upon a set of separate generalizers for the NETtalk task of translating text
to phonemes. The second demonstrates how stacked generalization improves
the performance of a single surface-fitter. With the other experimental
evidence in the literature, the usual arguments supporting cross-validation,
and the abstract justifications presented in this paper, the conclusion is that
for almost any real-world generalization problem one should use *some*
version of stacked generalization to minimize the generalization error rate.
This paper ends by discussing some of the variations of stacked
generalization, and how it touches on other fields like chaos theory.


To retrieve this article, do the following:

unix> ftp archive.cis.ohio-state.edu
login: anonymous
password: {your e-mail address}
ftp> binary
ftp> cd pub/neuroprose
ftp> get wolpert.stack_gen.ps.Z
ftp> quit
unix> uncompress wolpert.stack_gen.ps.Z
unix> lpr wolpert.stack_gen.ps {or however you get postscript printout}


More information about the Connectionists mailing list