Combining classifiers to reduce false alarm rate

Tue Jul 4 12:14:27 EDT 1995

In a recent thread on the subject of combining classifiers
Nathan Intrator and David Wolpert mentioned the importance of the
lack of correlation between the classifiers being combined.
It is obvious that if the classifiers make the same errors
(no matter what their architectures) then nothing can be gained
by combining them. In order to make the basic classifiers less
correlated, one can train them on different subsets of the training 
set, but the trade-off is that each will have a smaller training set.

This is true for closed set problems. For open set problems,
(eg. continuous speech and handwriting recognition), for which there
are "false alarm" errors in addition to misclassification errors,
it may be sufficient to use the same training set with different
training algorithms or even merely different starting classifiers.

Assuming the training processes succeed, these networks will agree on the
training set, and should behave similarly on patterns similar to those
in the training set. However, the networks will probably disagree as to
patterns unlike those in the training set, since there were no constraints
placed on these during the training process. Thus it would seem that by
combining the opinions of several networks  the false alarm rate may be
drastically reduced without significantly reducing the classification rate,
(perhaps even improving it).

Geometrically speaking, for MLP classifiers, the training set imposes
restrictions on the placing of each classifier's hyperplanes. The negative
examples will natural fall randomly into domains corresponding to some
class, but unless they are sufficiently similar to positive examples,
there should be decorrelation between the domains into which they fall.
When comparing identifications, the different networks should respond
similarly to positive examples, but will tend to disagree regarding the
negative ones. This behavior should allow one to differentiate between
negatives and positives, thus effectively rejecting false alarms.

Over the past four years we have found empirically that the false alarm
rate of cursive script and continuous speech systems can be significantly
reduced by combining the outputs of several multilayer perceptrons.
We have also observed similar effects on artificial benchmark problems,
and analytically proven the idea for a solvable toy problem. I can email
LaTeX conference papers to interested parties.

Jonathan (Yaakov) Stein