NFL for NFL

Thu Nov 23 07:26:30 EST 1995

I would like to make a comment 
to the NFL discussion from my point of view.
(Thanks to David Wolpert for remaining active in this discussion
he initiated.)

1.) If there is no relation between the function values
    on the test and training set
    (i.e. P(f(x_j)=y|Data) equal to the unconditional P(f(x_j)=y) ),
    then, having only training examples y_i = f(x_i) (=data) 
    from a given function, it is clear that I cannot learn anything 
    about values of the function at different arguments, 
    (i.e. for f(x_j), with x_j not equal to any x_i = nonoverlapping test set).

2.) We are considering two of those (influence) relations P(f(x_j)=y|Data):
    one, named A, for the true nature (=target) and one, named B, for our 
    model under study (=generalizer).
    Let P(A and B) be the joint probability distribution for the
    influence relations for target and generalizer.

3.) Of course, we do not know P(A and B), but in good old Bayesian tradition,
    we can construct a (hyper-)prior P(C) over the family of probability 
    distributions of the joint distributions C = P(A and B).

4.) NFL now uses the very special prior assumption
    P(A and B) = P(A)P(B), or equivalently P(B|A)=P(B), which means
    NFL postulates that there is (on average) no relation between nature
    and model. No wonder that (averaging over targets P(A) or over
    generalizers P(B) ) cross-validation works 
    as well (or as bad) as anti-cross-validation or anything else in such cases.

5.) But target and generalizer live on the same planet (sharing the
    same laws, environment, history and maybe even building blocks)
    so we have very good reasons to assume a bias for (hyper-)priors towards
    correlated P(A and B) not equal to the uncorrelated product P(A)P(B)!
    But that's not all: We do have information which is not of the
    form y=f(x). We know that the probability for many relations in nature 
    to be continuous on certain scales seems to be high (->regularization). 
    We can have additional information about other properties of the
    function, e.g.symmetries, compare Abu-Mostafa's concept of hints,
    which produces correlations between A (=target) and B (=model).
    In this sense I aggree with David Wolpert on looking
>>>
    how to characterize the needed relationship between the set of 
    generalizers and the prior that allows cross-validation to work.
>>>

To summarize it in a provocative way:
There is no free lunch for NFL:
Only if you assume that no relation between target and model exists,
then you don't find a relation between target and model!

And to be precise, I say that it is rational to believe 
(and David does so too, I think) that in real life cross-validation 
works better in more cases than  anti-cross-validation.

Joerg Lemm