NFL and practice

Joerg_Lemm lemm at LORENTZ.UNI-MUENSTER.DE
Wed Dec 13 09:46:52 EST 1995


Some remarks to Craig Hicks arguments on crossvalidation and NFL in general
from my point of view:

One may discuss NFL for theoretical reasons, but
the conditions under which NFL-Theorems hold
are not those which are normally met in practice.

1.) In short, NFL assumes that data, i.e. information of the form y_i=f(x_i),
do not contain information about function values on a non-overlapping test set.
This is done by postulating "unrestricted uniform" priors, 
or uniform hyperpriors over nonumiform priors... (with respect to Craig's 
two cases this average would include a third case: target and model are 
anticorrelated so anticrossvalidation works better) and "vertical" likelihoods.
So, in a NFL setting data never say something about function values 
for new arguments.
This seems rather trivial under this assumption and one has to ask
how natural is such a NFL situation.

2.) Information of the form y_i=f(x_i) is rather special and not what
we normally have. There is much information which is not of this 
"single sharp data" type. (Examples see below.)

There is absolutly no reason why information which depends on more than
one f(x_i) should not be incorporated. (This can be done using nonuniform 
priors or in a way more symmetrical to "sharp data".)
NFL just describes the situation in which we don't have
any such information but much of the (then quite useless)
"sharp data". But these sharp data are not less (maybe more) obscure
as other forms of information.

Information which is not of this "single sharp data" form but includes
many or all f(x_i) to produce one answer normally induces correlations 
between target and generalizer if included into the generalizer. 
At the same time there is no real off training set anymore!

Examples:

3) Informations like symmetries (even if only approximate), maxima,
Fouriercomponents (and much, much more ...) involve more than one f(x_i).
Fouriercomponents, for example, can be seen as sharp data but for different 
basisvectors, i.e. asking for momentum instead of location.
This shows again, that the definition of "sharp data" corresponds to choosing 
a "basis of questions" and is no natural entity!!!


4) Real measurements (especially of continuous variables)
normally do also NOT have the form y_i=f(x_i) !
They mostly perform some averaging over f(x_i) or
at least they have some noise on the x_i (as small as you like, but present).
In the latter case of "sharp" noise posing the same question several times 
gives you also an average of several (nearby) y 
with different x_i of the underlying true function.
In both cases the averaging is equivalent to regularization
for the "effective" function which we can observe!!!
This shows that smoothness of the expectation (in contrast to uniform priors) 
is the result of the measurement process and therefore
is a real phenomena for "effective" functions.
There is no need to see it just as a subjective prior!
(The same could be said on a quantummechanical level, but that's another story.)
It follows that NFL results do NOT hold 
for the "effective" functions in such situations,
even if assuming NFL for the underlying true functions. 

5.) NFL again:
Averaging or noise in the input space of the x_i requires a
probability distribution in that space
which can be defined independently from a specific function.
Noise means that x_i is a random variable dependend from 
an actual question z_i, i.e. p(actual argument = x_i | question=z_i)
and it is f(z_i) which we can observe.

If you don't accept a given p(x_i|z_i), I am sure you can average over 
"all possible" of such relations with unrestricted "uniform" priors to 
find that it is impossible to obtain any information about any function 
without assuming a priori that you know something about what you are asking.
This could be seen as another NFL-Theorem for questions: You do not even get 
informations about a single function value if you don't know (assume,define) 
a priori what you are asking!

6.) With respect to the underlying "true" function
off-training set error itself, an important concept for NFL, is in general 
no longer a measurable quantity if input noise or averaging is present!! 
(For simplicity let's assume that noise or averaging includes all 
questions x_i. Then in the case of noise you only have a probability 
for the x_i to belong to the "true" training set 
and averaging includes all questions x_i.)
So for the "true" functions there remains nothing NFL can say something about
and for the "effective" functions NFL is not valid!

To conclude: 

In many interesting cases "effective" function values contain information 
about other function values and NFL does not hold!

The very special handling of "sharp data" in comparison to other 
information must be discussed for much more learning theories.

Joerg Lemm 
(Institute for Theoretical Physics I, University of Muenster, Germany)




More information about the Connectionists mailing list