Multiple Models, Committee of nets etc...

Thu Jul 29 03:27:21 EDT 1993

Tom Dietterich write:
> This analysis predicts that using a committee of very diverse
> algorithms (i.e., having diverse approximation errors) would yield
> better performance (as long as the committee members are competent)
> than a committee made up of a single algorithm applied multiple times
> under slightly varying conditions.

and David Wolpert writes:
>There is a good deal of heuristic and empirical evidence supporting
>this claim. In general, when using stacking to combine generalizers,
>one wants them to be as "orthogonal" as possible, as Tom maintains.

One minor result from my thesis shows that when the estimators are
orthogonal in the sense that

              E[n_i(x)n_j(x)] = 0 for all i<>j

where n_i(x) = f(x) - f_i(x), f(x) is the target function, f_i(x) is
the i-th estimator and the expected value is over the underlying 
distribution; then the MSE of the average estimator goes like 1/N
times the average of the MSE of the estimators where N is the number 
of estimators in the population.  

This is a shocking result because all we have to do to get arbitrarily 
good performance is to increase the size of our estimator population!
Of course in practice, the nets are correlated and the result is no
longer true.

Michael
--------------------------------------------------------------------------------
Michael P. Perrone                                      Email: mpp at cns.brown.edu
Institute for Brain and Neural Systems                  Tel:   401-863-3920
Brown University                                        Fax:   401-863-3934
Providence, RI 02912