Committees

Tue Aug 3 01:45:18 EDT 1993

David Wolpert writes:
-->Many of the results in the literature which appear to dispute this
-->are simply due to use of an error function which is not restricted to
-->being off-training set. In other words, there's always a "win" 
-->if you perform rationally on the training set (e.g., reproduce it
-->exactly, when there's no noise), if your error function gives you
-->points for performing rationally on the training set. In a certain
-->sense, this is trivial, and what's really interesting is off-training
-->set behavior. In any case, this automatic on-training set win is all
-->those aforementioned results refer to; in particular, they imply essentially
-->nothing concerning performance off of the training set.

In the case of averaging for MSE optimization (the meat and potatoes of 
neural networks) and any other convex measure, the improvement due
to averaging is independent of the distribution - on-training or off-.
It depends only on the topology of the optimization measure. 

It is important to note that this result does NOT say the average is better
than any individual estimate - only better than the average population
performance.  For example, if one had a reliable selection criterion for
deciding which element of a population of estimators was the best and that
estimator was better than the average estimator, then just choose the better
one. (Going one step further, simply use the selection criterion to choose
the best estimator from all possible weighted averages of the elements of
the population.) As David Wolpert pointed out, any estimator can be confounded
by a pathological data sample and therefore there doesn't exist a *guaranteed*
method for deciding which estimator is the best from a population in all cases.
Weak (as opposed to guaranteed) selection criteria exist in in the form of
cross-validation (in all of its flavors).  Coupling cross-validation with
averaging is a good idea since one gets the best of both worlds particularly
for problems with insufficient data.

I think that another very interesting direction for research (as David Wolpert
alluded to) is the investigation of more reliable selection criterion.

-Michael