No subject

Mon Jul 26 14:15:19 EDT 1993

Subject:  Committee Machines:

The best method to  generate a committee of learning machines is given by
Schapire's algorithm [1].  The boosting algorithm that constructs a committee
of three machines is as follows: 

(1) Train a first learning machine using some training set.

(2) A training set for a second committee machine is obtained  
in the following manner:

   (a)  Toss a fair coin.  If heads, pass NEW  data through the first machine
   until the first machine misclassifies  the data and add this misclassified data to
   the training set for the second machine.  If the coin tossing is tails pass
   data through the first network until the first network classifies correctly
   and add this data to the training set for the second machine.  Thus the
   training set for the second machine consists of data which if passed through
   the first machine would give a 50% error rate.  This procedure is iterated
   until there is a large enough training set.  Data classified correctly when the coin
   tossing is heads or classified incorrectly when the coin tossing is tails is not used.

  (b)  train the second machine.

(3) A training set for a third machine is obtained in the following manner:

  (a) Pass NEW data through the first two trained machines.  If the two machines
  agree on the classification (whether correct or not), toss out the data.  If
  they disagree, add this data to the training set for the third machine.  
  Iterate until there is a large enough training set.

   (b) Train the third machine.

(4)  In the testing phase, a pattern is presented to all three machines.  If
the first two machines agree, use that labeling; otherwise use the labeling
of the third machine.

The only problem with this approach is generating enough data.  For OCR recognition
we have synthetically enlarged the database by deforming the original data
[2].   Boosting dramatically improved error rates.  We are
publishing a new paper that has much more detail [3].

Harris Drucker

References:

1. R.Schapire, "The Strength of weak learnability"  Machine Learning 5, Number 2,
(1990), p197-227

2. H.Drucker, R.Schapire, and P. Simard, "Improving Performance in Neural Networks
Using a Boosting Algorithm" Neural Information Processing Systems 5, proceeding
of the 1992 conference (published 1993), Eds: J.Hanson, J Cowan, C.L. Giles
p. 42-49.

3.H.Drucker, R. Schapire, P. Simard, "Boosting Performance in Neural Networks", International
Journal of Pattern Recognition and Artificial Intelligence, Vol 7, Number 4, (1993), to be
published.