No subject

Wed Jul 28 17:14:17 EDT 1993

About committees and boosting:

My previous communication used the word "best".  That was a little
puffery on my part and a reasonable self-imposed limitation of the use
of this medium.  For an explicit list of limitations, assumptions, etc
as to when and where boosting applies send me your e-mail address 
and I will send you a troff file of a preprint of an article.  I can also send
hard copies if there are not too many requests.

More to the point:  if you are interested in classification and want to
improve performance, boosting is a reasonable approach,  Instead of struggling to build 
different classifiers and then figuring out the best way to combine them, boosting
by filtering explicitly shows us how you filter the data so that each machine
learns a different distribution of the training set.  In our work in OCR 
using multilayer networks (single layer networks are not powerful enough)
boosting has ALWAYS improved performance. Synthetically enlarging the database
using deformations of the original data is essential. 

In one case, a network (circa 1990) which had
an error rate on United State Postal Service digits of 4.9% and a reject rate of
11.5% (in order to achieve a 1% error rate on those not rejected) was boosted to
give a 3.6% error rate and a 6.6% reject rate.  Someone then invented a new network
that had a 3.3% error rate and a 7.7% reject rate and this was boosted to give a 2.6% 
error rate and 4.0% reject rate.  This is very close to the estimated human performance
of 2.5%.

Can someone find a better single network (using the original database) that is
better than a boosted committee.  Maybe.  But good networks are hard to find and
if you can find it, you can probably boost it. 

Can one improve performance by using the synthetically enlarged database
and a "larger" single machine.  Yes, but we have yet to find a single network
that does better that a boosted committee.  

A final note:  rather than straight voting, we have found that simply summing
the respective outputs of the three neural networks gives MUCH better results
(as quoted above).  As pointed out by David Bisant, voting does not explicitly
include the confidence.  In neural networks, a measure of confidence is the difference
between the two largest outputs.  By simply voting, you ignore the fact that one
of the members of the committee may be very confident about its results.  By adding,
networks with high confidence influence the results more and lower both the
error rate and especially the reject rate..      

Harris Drucker
Bell Labs phone:  908-949-4860
Monmouth College phone:  908-571-3698
email: drucker at monmouth.edu (preferred)