Random vs. single-valued rules

Wed Aug 18 18:37:32 EDT 1993

tmb writes:

>>>>>
In reply to dhw at santafe.edu:
|The bottom line, as I see it: arguments like Michael's show that one
|should always use a single-valued learning algorithm rather than a
|stochastic one.

>From context, I'm assuming that you are referring to "deterministic"
vs. "randomized" decision rules, as they are called in decision theory
("stochastic learning algorithm" means something different to me, but
maybe I'm just misinterpreting your posting).

Picking an opinion from a pool of experts randomly is clearly not a
particularly good randomized decision rule in most cases.  However,
there are cases in which properly chosen randomized decision rules are
important (any good introduction on Bayesian statistics should discuss
this).  Unless there is an intelligent adversary involved, such cases
are probably mostly of theoretical interest, but nonetheless, a
randomized decision rule can be "better" than any deterministic one.
>>>>

Implicit in my statement was the context of Michael Perrone's
posting (which I was responding to): convex loss functions,
and the fact that in particular, one "single-valued learning algorithm"
one might use is the one Michael advocates: average over your
pool of experts.

Obviously one can choose a single-valued learning algorithm
which performs more poorly than randomly drawing from a pool of
experts:      

1) One can prove that (for convex loss) averaging over the pool is
preferable to randomly sampling the pool (Michael's result; note
assumptions about lack of correlations between the experts and the
like apply.)

2) One can not prove that averaging beats any other single-valued     
use of the experts.

3) Note that neither (1) nor (2) contradict the assertion that
there might be single-valued algorithms which perform worse than
randomly sampling the pool.

4) For the case of a 0-1 loss function, and a uniform prior over
target functions, it doesn't matter how you guess; all algorithms  
perform the same, both averaged over data and for one particular
data (as far as off-training set average loss is concerned).

David Wolpert