"Orthogonality" of the generalizers being combined

Paul Munro munro at lis.pitt.edu
Sun Jul 16 00:08:07 EDT 1995


On Sat, 1 Jul 1995, David Wolpert wrote:

> 
> In his recent posting, Nathan Intrator writes
> 
> >>>
>  combining, or in the simple case
> averaging estimators is effective only if these estimators are made
> somehow to be independent.
> >>>
> 
> This is an extremely important point. Its importance extends beyond

(stuff deleted) 
> In other words, although those generalizers are about as different
> from one another as can be, *as far as the data set in question was
> concerned*, they were practically identical. This is a great flag that
> one is in a data-limited scenario. I.e., if very different
> generalizers perform identically, that's a good sign that you're
> screwed.
> 
> Which is a round-about way of saying that the independence Nathan
> refers to is always with respect to the data set at hand. This is
> discussed in a bit of detail in the papers referenced below.
> 
> ***
> 
> Getting back to the precise subject of Nathan's posting: Those
> interested in a formal analysis touching on how the generalizers being
> combined should differ from one another should read the Ander Krough
> paper (to come out in NIPS7) that I mentioned in my previous
> posting. A more intuitive discussion of this issue occurs in my
> original paper on stacking, where there's a whole page of text
> elaborating on the fact that "one wants the generalizers being
> combined to (loosely speaking) 'span the space' of algorithms and be
> 'mutually orthogonal'" to as much a degree as possible.

(more stuff deleted)

Bambang Parmanto and I have found that negative correlation among the
individual classifiers can improve committee performance even more than
zero correlation. So rather than a zero inner product (othogonality), a
negative inner product is preferable.  Of course, this may be just a matter
of definition -- our comparisons are made using the error vector on a 
test set.  That is, it's better for errors to be independent than it is 
for them to be coincident, but it's even better if the coincidence is 
below the expected coincidence rate for independent classifiers.  Note 
that to ahieve a significant level of negative correlation, the overall
generalization performance must be fairly high...


More information about the Connectionists mailing list