NFL and practice

Mon Jun 5 16:42:55 EDT 2006

Joerg Lemm wrote
> One may discuss NFL for theoretical reasons, but
> the conditions under which NFL-Theorems hold
> are not those which are normally met in practice.

Exactly the opposite.  The theory behind NFL is trivial (in some sense).
The power of NFL is that it deals directly with what is rountinely 
practiced in the neural network community today.

> 1.) In short, NFL assumes that data, i.e. information of the form y_i=f(x_i),
> do not contain information about function values on a non-overlapping test set.
> This is done by postulating "unrestricted uniform" priors, 
> or uniform hyperpriors over nonumiform priors... (with respect to Craig's 
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> two cases this average would include a third case: target and model are 
> anticorrelated so anticrossvalidation works better) and "vertical" likelihoods.
> So, in a NFL setting data never say something about function values 
> for new arguments.
> This seems rather trivial under this assumption and one has to ask
> how natural is such a NFL situation.

This is indeed a very trivial and unnatural assumption, which has been 
criticised by generations of statisticians over several centuries.  
However, it is exactly what is practiced by a majority of NN researchers.

Consider the claim: "This is an algorithm which will perform well as long
as there is some nonuniform prior".  

If such a claim could ever be true, then the algorithm would also be 
good for a uniform hyperprior over nonuniform priors. But this is in 
direct contradiction to NFL.

According to NFL, you have to say:"This is an algorithm which will perform
well on this particular nonuniform prior, (hence it will perform badly on 
that particular nonuniform prior)".  

Similarly, with the Law of Energy Conservation, if you say "I've designed 
a machine to generate electricity", then you automatically imply that you 
have designed a machine to consume some other forms of energy.

You can't make every term positive in your balance sheet, if the grand
total is bound to be zero.

>

Joerg continued with examples of various priors of practical concern,
including smoothness, symmetry, positive correlation, iid samples, etc.
These are indeed very important priors which match the real world,
and they are the implicit assumptions behind most algorithms.

What NFL tells us is: If your algorithm is designed for such a prior,
then say so explicitly so that a user can decide whether to use it.
You can't expect it to be also good for any other prior which you have
not considered.  In fact, in a sense, you should expect it to perform
worse than a purely random algorithm on those other priors.

> To conclude: 
> 
> In many interesting cases "effective" function values contain information 
> about other function values and NFL does not hold!

This is like saying "In many interesting cases we do have energy sources,
and we can make a machine running forever, so the natural laws against
`perpetual motion machines' do not hold."  

These general principles might not be quite obviously interesting to a 
user, but they are of fundamental importance to a researcher.  They are
in fact also of fundamental importance to a user, as he must assume the
responsibility of supplying the energy source, or specifying the prior.

--
Huaiyu Zhu, PhD                   email: H.Zhu at aston.ac.uk
Neural Computing Research Group   http://neural-server.aston.ac.uk/People/zhuh
Dept of Computer Science          ftp://cs.aston.ac.uk/neural/zhuh
    and Applied Mathematics       tel: +44 121 359 3611 x 5427
Aston University,                 fax: +44 121 333 6215
Birmingham B4 7ET, UK              

----- End Included Message -----