combining generalizers guesses

Tue Jul 27 14:37:18 EDT 1993

>It seems to me that more attention needs to be paid to *which*
>generalizer's guesses we are combining.  There are three basic
>components that determine generalization error:
>
>   * inherent error in the data (which determines the bayes optimal error rate)
>   * small sample size
>   * approximation error (which prevents the algorithm from correctly
>     expressing the bayes optimal hypothesis, even with infinite samples)

   I also think there is another source of error in addition to those
given above which can be removed by combining generalizers.  This
source is:

      * the lack of confidence in the prediction.

Most neural networks and other classifiers produce a continuous
output.  Usually, during classification, a threshold or winner take
all method is used to decide the classification.  If you imagine a
network which classifies inputs into one of 3 outputs and you see
some classifications which appear as follows:

     a.  0.27  0.21  0.86      b.  0.41  0.48  0.53

it is obvious the third class is the winner, but it is also obvious
classification "a" has much more confidence than "b".  Whichever
arbitration mechanism is used to combine the generalizers should
take this information into account.

> So, it seems to me the key question is what are the best ways of
> creating a diverse "committee"? 

Most researchers who work in applying neural networks use  a
committee approach for the final decision.  Some empirical
research has been done over the last 4 years to find the best way. 
Waibel and Hampshire have presented some work in NIPS, IEEE, and 
IJCNN 3 years ago where they used different objective functions to
create very diverse networks.  I believe they used the following
objective functions:
    1  squared error
    2  classification figure of merit (CFM)
    3  cross entropy.
The networks produced, especially by the CFM, were very different. 
As an arbitration mechanism, they found that a simple average worked
better than other more complicated methods including a neural
network.  All the arbitration mechanisms they tried were able to
take the confidence factor, mentioned above, into account.

David Bisant
Stanford PDP Group

@article{ hampshire2,
  author="Hampshire II, J. B. and Waibel, A. H.",
  title="{A Novel Objective Function for Improved Phoneme Recognition
Using Time-Delay Neural
Networks}",
  journal="IEEE Transactions on Neural Networks",
  volume="1",
  number="2",
  year="1990",
  pages="216-228"}