No subject
Drucker Harris
drucker at monmouth.edu
Mon Jul 26 14:15:19 EDT 1993
Subject: Committee Machines:
The best method to generate a committee of learning machines is given by
Schapire's algorithm [1]. The boosting algorithm that constructs a committee
of three machines is as follows:
(1) Train a first learning machine using some training set.
(2) A training set for a second committee machine is obtained
in the following manner:
(a) Toss a fair coin. If heads, pass NEW data through the first machine
until the first machine misclassifies the data and add this misclassified data to
the training set for the second machine. If the coin tossing is tails pass
data through the first network until the first network classifies correctly
and add this data to the training set for the second machine. Thus the
training set for the second machine consists of data which if passed through
the first machine would give a 50% error rate. This procedure is iterated
until there is a large enough training set. Data classified correctly when the coin
tossing is heads or classified incorrectly when the coin tossing is tails is not used.
(b) train the second machine.
(3) A training set for a third machine is obtained in the following manner:
(a) Pass NEW data through the first two trained machines. If the two machines
agree on the classification (whether correct or not), toss out the data. If
they disagree, add this data to the training set for the third machine.
Iterate until there is a large enough training set.
(b) Train the third machine.
(4) In the testing phase, a pattern is presented to all three machines. If
the first two machines agree, use that labeling; otherwise use the labeling
of the third machine.
The only problem with this approach is generating enough data. For OCR recognition
we have synthetically enlarged the database by deforming the original data
[2]. Boosting dramatically improved error rates. We are
publishing a new paper that has much more detail [3].
Harris Drucker
References:
1. R.Schapire, "The Strength of weak learnability" Machine Learning 5, Number 2,
(1990), p197-227
2. H.Drucker, R.Schapire, and P. Simard, "Improving Performance in Neural Networks
Using a Boosting Algorithm" Neural Information Processing Systems 5, proceeding
of the 1992 conference (published 1993), Eds: J.Hanson, J Cowan, C.L. Giles
p. 42-49.
3.H.Drucker, R. Schapire, P. Simard, "Boosting Performance in Neural Networks", International
Journal of Pattern Recognition and Artificial Intelligence, Vol 7, Number 4, (1993), to be
published.
More information about the Connectionists
mailing list