combining generalizers' guesses

Mon Jul 26 12:01:15 EDT 1993

Tom Dietterich <tgd at chert.cs.orst.edu> writes:
> ... (good stuff deleted) ...
> This analysis predicts that using a committee of very diverse
> algorithms (i.e., having diverse approximation errors) would yield
> better performance (as long as the committee members are competent)
> than a committee made up of a single algorithm applied multiple times
> under slightly varying conditions.  
> 
> ...
>
> So, it seems to me the key question is what are the best ways of
> creating a diverse "committee"? 
> 
> --Tom

One possible way of diversifying the committee (don't *I* sound PC!)
is to make the inductive bias of the learning algorithm explicit, or
as an approximation, add a new inductive bias that is strong enough to
override biases inherent in the algorithm. This can be done a number
of ways, by adding extra terms to the error equation or some other
kludges.  By then running the same algorithm with widely differing
biases, one can approximate different algorithms.

[warning: blatant self-promotion follows :-)]
For example, a few years ago, we looked at something like this with a
different end in mind. The selective sampling algorithm was used to
identify potentially useful training examples by means of what has
become known as the committee approach (with a twist).

Two identical networks were trained on the same positive-negative
classification problem with the same training data. We added two
different inductive biases to the backprop training, though: One
network (S) was trained to find the most *specific* concept consistent
with the data. That is, it was to try to classify only the positive
training examples as positive, and as much else of the domain as
possible was to be classified as negative. The other network (G) was
trained to find the most *general* concept consistent with the data,
that is, to classify as much of the domain as positive as it could
while accommodating the negative training examples.

The purpose of these biases was to decide whether a potential training
example was interesting. If the two networks disagreed on its
classification, then then it lay in the architecture's version space,
and should be queried/added.

These, and other biases would suggest themselves as appropriate,
though, for producing diverse committee members for voting on
the classification/output of a network.

For those interested in the details of the selective sampling
algorithm, we have a paper which is to appear in Machine Learning.
It is available by anonymous ftp to "psyche.mit.edu"; the paper is
in "pub/cohn/selsampling.ps.Z".

 -David Cohn				e-mail: cohn at psyche.mit.edu
  Dept. of Brain & Cognitive Science	phone:  (617) 253-8409
  MIT, E10-243
  Cambridge, MA 02139