compressibility and generalization

Mon Dec 4 15:33:50 EST 1995

On the implecations of No Free Lunch Theorem(s) by David Wolpert,

> From: Juergen Schmidhuber <schmidhu at informatik.tu-muenchen.de>
>
> (3) Hence, the best we may hope for is a learning technique with
> good expected generalization performance in *arbitrary* compressible
> universes. Actually, another restriction is necessary: the time
> required for compression and decompression should be ``tolerable''.
> To formalize the expression ``tolerable'' is subject of ongoing
> research.

However, the deeper NFL Theorem states that this is still impossible:

1. The *non-existence* of structure guarantees any algorithm will 
neither win nor lose, compared with the "random algorithm", in the long 
run. If this were all that is there, then NFL would be just tautology. 

2. The *mere existence* of structure guarantees a (not uniformly-random)
algorithm as likely to lose you a million as to win you a million, 
even in the long run.  It is the *right kind* of structure that makes 
a good algorithm good.

3. This is by far one of the most important implications of NFL, yet my 
sample from Connectionist show that it is safe to make the posterior 
prediction that if someone criticises NFL as irrelevent, then he has not 
got this far yet.

In conclusion: "for arbitrary environment there is an optimal algorithm"
is drastically different from "there is an optimal algorithm for arbitrary
environment", whatever restrictions you make on the word "arbitrary".

--
Huaiyu Zhu, PhD                   email: H.Zhu at aston.ac.uk
Neural Computing Research Group   http://neural-server.aston.ac.uk/People/zhuh
Dept of Computer Science          ftp://cs.aston.ac.uk/neural/zhuh
    and Applied Mathematics       tel: +44 121 359 3611 x 5427
Aston University,                 fax: +44 121 333 6215
Birmingham B4 7ET, UK