Combining Generalizers

Mon Jul 3 10:41:52 EDT 1995

Lately, there has been a great deal of interest in combining
estimates, and especially combining neural network outputs.
Combining has a LONG history, with seminal ideas contained
in Selfridge's Pandemonium (1958) and Nilsson's book on Learning 
Machines (1965); it is also found i diverse areas 
(e.g. for at least 20 years in econometrics as "forecast combining").

Recent research on this topic in the neural net/machine learning 
community largely focuses on 
(i) WHAT (type of) experts to combine, or
(ii) HOW to combine them, or
(ii) EXPERIMENTALLY show that combining gives better results

Another important question is, how much benefit (%age, limits, reliability,..)
can combining methods yield. At least two recent PhD theses
(Perrone, Hashem) mathematically address this issue for REGRESSION problems,

We have approached this problem for CLASSIFICATION problems
by studying the effect of combining on the 
decision boundaries. The results pinpoint the mechanism by
which classification results are improved, and provide limits,
including a new way of estimating Bayes' rate.
A preliminary version appears as an invited paper in SPIE Proc. Vol 2492,
pp. 573-585, (Orlando Conf, April '95); the full version,
currently under journal review, can be retrieved from

http://pegasus.ece.utexas.edu:80/~kagan/publications.html

Besides review and analysis, it contains a reference listing
that includes most of the papers quoted on this forum in the past week.
The title and abstract follows:

    THEORETICAL FOUNDATIONS OF LINEAR AND ORDER
STATISTICS COMBINERS FOR NEURAL PATTERN CLASSIFIERS 
                    by
        Kagan Tumer and Joydeep Ghosh

Several researchers have experimentally shown that substantial
improvements can be obtained in difficult pattern recognition
problems by combining or integrating the outputs of multiple
classifiers.
This paper provides an analytical framework to quantify the
improvements in classification results due to combining. The
results apply to both linear combiners and the order statistics
combiners introduced in this paper.
We show that combining networks in output space
reduces the variance of the actual decision region boundaries
around the optimum boundary.
For linear combiners, we show that in the absence of classifier bias,
the added classification error is proportional to the boundary variance.
In the presence of bias, the error reduction is shown to be less than or
equal to the reduction obtained in the absence of bias.
For non-linear combiners, we show analytically that the selection of
the median, the maximum and in general the $i$th order statistic
improves classifier performance.
The analysis presented here facilitates the
understanding of the relationships among error rates,
classifier boundary distributions, and combining in output space.
The combining results can also be used to estimate
the Bayesian error rates.
Experimental results on several public domain data sets
are provided to illustrate the benefits of combining.

------
All comments welcome.
______
Sorry, no hard copies.

Kagan Tumer
Dept. of ECE
The University of Texas, Austin