NFL for NFL
Joerg_Lemm
lemm at LORENTZ.UNI-MUENSTER.DE
Thu Nov 23 07:26:30 EST 1995
I would like to make a comment
to the NFL discussion from my point of view.
(Thanks to David Wolpert for remaining active in this discussion
he initiated.)
1.) If there is no relation between the function values
on the test and training set
(i.e. P(f(x_j)=y|Data) equal to the unconditional P(f(x_j)=y) ),
then, having only training examples y_i = f(x_i) (=data)
from a given function, it is clear that I cannot learn anything
about values of the function at different arguments,
(i.e. for f(x_j), with x_j not equal to any x_i = nonoverlapping test set).
2.) We are considering two of those (influence) relations P(f(x_j)=y|Data):
one, named A, for the true nature (=target) and one, named B, for our
model under study (=generalizer).
Let P(A and B) be the joint probability distribution for the
influence relations for target and generalizer.
3.) Of course, we do not know P(A and B), but in good old Bayesian tradition,
we can construct a (hyper-)prior P(C) over the family of probability
distributions of the joint distributions C = P(A and B).
4.) NFL now uses the very special prior assumption
P(A and B) = P(A)P(B), or equivalently P(B|A)=P(B), which means
NFL postulates that there is (on average) no relation between nature
and model. No wonder that (averaging over targets P(A) or over
generalizers P(B) ) cross-validation works
as well (or as bad) as anti-cross-validation or anything else in such cases.
5.) But target and generalizer live on the same planet (sharing the
same laws, environment, history and maybe even building blocks)
so we have very good reasons to assume a bias for (hyper-)priors towards
correlated P(A and B) not equal to the uncorrelated product P(A)P(B)!
But that's not all: We do have information which is not of the
form y=f(x). We know that the probability for many relations in nature
to be continuous on certain scales seems to be high (->regularization).
We can have additional information about other properties of the
function, e.g.symmetries, compare Abu-Mostafa's concept of hints,
which produces correlations between A (=target) and B (=model).
In this sense I aggree with David Wolpert on looking
>>>
how to characterize the needed relationship between the set of
generalizers and the prior that allows cross-validation to work.
>>>
To summarize it in a provocative way:
There is no free lunch for NFL:
Only if you assume that no relation between target and model exists,
then you don't find a relation between target and model!
And to be precise, I say that it is rational to believe
(and David does so too, I think) that in real life cross-validation
works better in more cases than anti-cross-validation.
Joerg Lemm
More information about the Connectionists
mailing list