The last of a dying thread

Tue Dec 12 17:25:06 EST 1995

Some comments on the NFL thread.

Huaiyu Zhu writes

>>>
2. The *mere existence* of structure guarantees a (not uniformly-random)
algorithm as likely to lose you a million as to win you a million, 
even in the long run.  It is the *right kind* of structure that makes 
a good algorithm good.
>>>

This is a crucial point. It also seems to be one lost on many of the
contributors to this thread, even those subsequent to Zhu's
posting. Please note in particular that the knowledge that "the
universe is highly compressible" can NOT, by itself, be used to
circumvent NFL.

I can only plead again: Those who are interested in this issue should
look at the papers directly, so they have at least passing familiarity
with the subject before disussing it. :-)

ftp.santafe.edu, pub/dhw_ftp, nfl.1.ps.Z and nfl.2.ps.Z.

Craig Hicks then writes:

>>>
However, I interpret the assertion that anti-cross validation can be expected
to work as well as cross-validation to mean that we can equally well expect
cross-validation to lie.  That is, if cross-validation is telling us that the
generalization error is decreasing, we can expect, on average, that the true
generalization error is not decreasing.

Isn't this a contradiction, if we assume that the samples are really randomly
chosen?  Of course, we can a posteriori always choose a worst case function
which fits the samples taken so far, but contradicts the learned model
elsewhere.  But if we turn things around and randomly sample that deceptive
function anew, the learned model will probably be different, and
cross-validation will behave as it should.
>>>

That's part of the power of the NFL theorems - they prove that Hicks'
intuition, an intuition many people share, is in fact wrong.

>>>
I think this follows from the principle that the empirical distribution over
an ever larger number of samples converges to the the true distribution of a
single sample (assuming the true distribution is stationary).
>>>

Nope. The central limit theorem is not directly germane. See all the
previous discussion on NFL and Vapnik.

>>>>
CV is nothing more than the random sampling of prediction ability.  If the
average over the ensemble of samplings of this ability on 2 different models A
and B come out showing that A is better than B, then by definition A is better
than B.  This assumes only that the true domain and the ensemble of all
samplings coincide.  Therefore CV will not, on average, cause a LOSS in
prediction ability.  That is, when it fails, it fails gracefully,
on average.  It cannot be consistently deceptive.
	Fortunately, CV will report this (failure to generalize) by
showing a zero correlation between prediction and true value on the
off training set data. (Of course this is only the performance of CV
on average over the ensemble of off training set datas; CV may be
deceptive for a single off training set data.)
>>>

This is wrong (or at best misleading). Please read the NFL papers. In
fact, if the head-to-head minimax hypothesis concerning xvalidation
presented in those papers is correct, xvalidation is wrong more often
than it is right. In which case CV is "deceptive" more often (!!!)
than not.

Lev Goldfarb wrote

>>>
Strangely as it may sound at first, try to inductively learn the subgroup
of some large group with the group structure completely hidden. No
statistics will reveal the underlying group structure. 
>>>

It may help if people read some of the many papers (Cox, deFinnetti,
Erickson and Smith, etc., etc.) that prove that the only consistent
way of dealing with uncertainty is via probability theory. In other
words, there is nothing *but* statistics, in the real world. (Perhaps
occuring in prior knowledge that you're looking for a group, but
statistics nonetheless.)

David Wolpert