Provable optimality of averaging generalizers

Tue Aug 17 12:18:01 EDT 1993

David Wolpert writes:

-->1) Say I have 3 real numbers, A, B, and X. In general, it's always
-->true that with C = [A + B] / 2, [C - X]^2 <= {[A - X]^2 + [B - X]^2} /
-->2. (This is exactly analogous to having the cost of the average guess
-->bounded above by the average cost of the individual guesses.)
-->
-->2) This means that if we had a choice of either randomly drawing one
-->of the numbers {A, B}, or drawing C, that on average drawing C would
-->give smaller quadratic cost with respect to X.
-->
-->3) However, as Michael points out, this does *not* mean that if we had
-->just the numbers A and C, and could either draw A or C, that we should
-->draw C. In fact, point (1) tells us nothing whatsoever about whether A
-->or C is preferable (as far as quadratic cost with respect to X is
-->concerned).
-->
-->4) In fact, now create a 5th number, D = [C + A] / 2. By the same
-->logic as in (1), we see that the cost (wrt/ X) of D is less than the
-->average of the costs of C and A. So to the exact same degree that (1)
-->says we "should" guess C rather than A or B, it also says we should
-->guess D rather than A or C. (Note that this does *not* mean that D's
-->cost is necessarily less than C's though; we don't get endlessly
-->diminishing costs.)
-->
-->5) Step (4) can be repeated ad infinitum, getting a never-ending
-->sequence of "newly optimal" guesses. In particular, in the *exact*
-->sense in which C is "preferable" to A or B, and therefore should
-->"replace" them, D is preferable to A or B, and therefore should
-->replace *them* (and in particular replace C). So one is never left
-->with C as the object of choice.

This argument does not imply a contradiction for averaging!

This argument shows the natural result of throwing away information.
Step (4) throws away number B.  Given that we no longer know B, number D
is the correct choice. (One could imagine such "forgetting" to be useful
in time varying situations - which leads towards the Kalman filtering
that was mentioned in relation to averaging a couple of weeks ago.)
In Step (5), an infinite sequence is developed by successively throwing
away more and more of number B.  The infinite limit of Step (5) is
number A.  In other words, we have thrown away all knowledge of B.

-->So (1) isn't really normative; it doesn't say one "should" guess the
-->average of a bunch of guesses:

Normative?  Hey is this an ethics class!?  :-)

-->7) Choosing D is better than randomly choosing amongst C or A, just as
-->   choosing C is better than randomly choosing amongst A or B.
-->
-->8) This doesn't mean that given C, one should introduce an A and
-->   then guess the average of C and A (D) rather than C, just as
-->   this doesn't mean that given A, one should introduce a B and 
-->   then guess the average of A and B (C) rather than A.

Sure, if you're willing to throw away information.

Michael