NFL and practice

Fri Dec 15 09:28:49 EST 1995

Huaiyu Zhu responsed to
>> One may discuss NFL for theoretical reasons, but
>> the conditions under which NFL-Theorems hold
>> are not those which are normally met in practice.
and wrote
>Exactly the opposite.  The theory behind NFL is trivial (in some sense).
>The power of NFL is that it deals directly with what is rountinely
>practiced in the neural network community today.

That depends on how you understand practice.
E.g. in nearly all cases functions are somewhat smooth.
This is a prior which exists in reality (for example because
of input noise in the measuring process). 
And the situation would we hopeless
if we would not use this fact in practice.
(That is just what also NFL says.)
But, if Huaiyu means that it is necessary to think about
the priors in "practice" explicitly, then I fully aggree!

But what I wanted to say is: 
WE DO HAVE "PRIORS" (BETTER SAY CORRELATIONS BETWEEN
ANSWERS TO DIFFERENT QUESTIONS) IN MOST CASES 
and they are NOT obscure, but very often 
at least as well MEASUREABLE
as "normal" sharp data y_i=f(x_i).
Even more: situations without "priors" are VERY artificial.
So if we specify the "priors" (and the lesson from NFL is
that we should if we want to make a good theory) 
then we cannot use NFL anymore.(What should it be used for then?)

>Joerg continued with examples of various priors of practical concern,
>including smoothness, symmetry, positive correlation, iid samples, etc.
>These are indeed very important priors which match the real world,
>and they are the implicit assumptions behind most algorithms.
>
>What NFL tells us is: If your algorithm is designed for such a prior,
>then say so explicitly so that a user can decide whether to use it.
>You can't expect it to be also good for any other prior which you have
>not considered.  In fact, in a sense, you should expect it to perform
>worse than a purely random algorithm on those other priors.

Maybe the problem is that Huaiyu Zhu uses the word "PRIOR" for every
information which is not of the sharp data form y_i=f(x_i).
It suggests that we know something before starting our generalizer.
NO, that is not the normal case!!! I mentioned many examples 
(like measurement with input noise) where "priors" are just normal
information which should be used DURING learning like sharp data!
(Sharp data might be even not available at all!) And of course using
wrong "priors" is similar to using wrong sharp data.
But I fully aggree that most algorithm uses "prior" information
only implicitly and that there is a lot of theoretical work to do.

In response to
>> In many interesting cases "effective" function values contain information
>> about other function values and NFL does not hold!
Huaiyu Zhu continues 
>This is like saying "In many interesting cases we do have energy sources,
>and we can make a machine running forever, so the natural laws against
>`perpetual motion machines' do not hold."

Indeed, it is a little bit like that, but a system without energy sources
is a much better approximation for some real world systems
compared to a world without "priors"
(i.e. without correlated answers over different questions)!
So the energy law is useful,
but models for worlds without correlated information are NOT,
except maybe that they tell us to include the correlation
properly! 

Joerg Lemm
(Institute for Theoretical Physics I, University of Muenster, Germany)