combining generalizers' guesses

Sat Jul 24 11:43:24 EDT 1993

It seems to me that more attention needs to be paid to *which*
generalizer's guesses we are combining.  There are three basic
components that determine generalization error:

   * inherent error in the data (which determines the bayes optimal error rate)
   * small sample size
   * approximation error (which prevents the algorithm from correctly
     expressing the bayes optimal hypothesis, even with infinite samples)

Combining guesses can't do anything about the first problem.  I also
don't think it can have a very great effect on the second problem,
because all of the guesses are based on the same data.  I think that
the real win comes from combining guesses that make very different
approximation errors.

In my error-correcting code work, we re-code the outputs (by imposing
an error-correcting distributed representation) in such a way that (we
believe) the approximation errors committed by the learning algorithm
are nearly independent.  Then the decoding process combines these
guesses.  Interestingly, the decoding process takes a linear
combination of the guesses where the linear combination is unique for
each output class.  We are currently doing experiments to try to
understand the relative role of these three sources of error in the
performance of error-correcting output codes.

This analysis predicts that using a committee of very diverse
algorithms (i.e., having diverse approximation errors) would yield
better performance (as long as the committee members are competent)
than a committee made up of a single algorithm applied multiple times
under slightly varying conditions.  

In the error-correcting code work, we compared a committee of decision
trees to an error-correcting output procedure that also used decision
trees.  The members of the committee were generated by training on
different subsamples of the data (as in stacking), but the combination
method was simple voting.  No matter how many trees we added to the
committee, we could not come close to achieving the same performance
on the nettalk task as with the error-correcting output coding
procedure.

So, it seems to me the key question is what are the best ways of
creating a diverse "committee"? 

--Tom