Response to no-free-lunch discussion

David Wolpert dhw at santafe.edu
Tue Nov 14 13:24:58 EST 1995


Some quick comments on the recent discussion of no-free-lunch (NFL)
issues. As an aside, it's interesting to note how "all over the map"
the discussion is, from people who whole-heartedly agree with NFL, to
people who make claims diametrically opposed to it.


****



Simon Perkins writes:

>>>
some people claim that there _is_ a general
sub-class of problems for which it's at all feasible to find a
solution - perhaps problems whose solutions have low komologrov
complexity or something - and this might well mean that there _is_ a
general good learning algorithm for this class of `interesting
problems'.

Any comments?
>>>

There's no disputing this. Restricting attention to a sub-class of
problems is formally equivalent to placing a restriction on the target
and/or prior over targets (in the latter case, placing a restriction
on the prior's support).

Certainly once things are restricted this way, we are in the domain of
Bayesian analysis (and/or some versions of PAC), and not all
algorithms are the same.

However we have *no* a priori reason to believe that targets with "low
Kolmogorov complexity" (or anything else) are/not likely to occur in
the real world. 

Well, nothing can be proven from first principles to work well, you
might say. This actually isn't always true (the loss function is
important, there are minimax issues, etc.) But even in the simple
scenarios in which this sentiment is essentially correct (i.e., the
scenarios in which NFL holds), there is a huge body of literature
which purports to prove from first principles that some algorithms
*do* work better than others, without any assumption about the
targets:


***

E.g., claims that so long as the VC dimension of your algorithm is
low, the training set large, and the misclassification rate on the
training set small, then *independent of assumptions concerning the
target*, you can bound how large the generalization error is.

Or claims that boosting can only help generalization error, regardless
of the prior over targets.

Or the PAC "proof" of Occam's razor (which  - absurdly -  "holds" for
any and all complexity measures).

NFL results show up all such claims as problematic, at best. The
difficulty is not with the math behind these claims, but rather with
the interpretations of what that math means.

***


Dimitris Tsioutsias writes:

>>>
As any thoughtful person in the greater mathematical programming 
community might point out, there's no general optimization method
that could outperform any other on any kind of problem.
>>>

Obviously. But to give a simple example, before now, no such
"thoughtful person" would have had any idea of whether there might be
a "general optimization method" that only rarely performs worse than
random search.

Addressing such issues is one of the (more trivial) things NFL can do
for you.



***

Finally, Juergen Schmidhuber writes:

>>>
We already know that for a wide variety of non-incremental search 
problems, there *is* a theoretically optimal algorithm: Levin's 
universal search algorithm 
>>>

I have lots of respect for Juergen's work, but on this issue, I have
to disagree with him. Simply put, supervised learning (and in many
regards search) is an exercise in statistics, not algorithmic
information complexity theory. NFL *proves* this.

In practice, it may (or may not) be a good idea to use an algorithm
that searches for low Levin complexity rather than one that works by
other means. But there is simply no first principles reason for
believing so. Will such an algorithm beat backprop? Maybe, maybe
not. It depends on things that can not be proven from first
principles. (As well as a precise definition of "beat", etc.)

The distribution over the set of problems we encounter in the real
world is governed by an extraordinarily complicated interplay between
physics, chemistry, biology, psychology, sociology and
economics. There is no a priori reason for believing that this
interplay respects notion of algorithmic information complexity,
time-bounded or otherwise.







David Wolpert


More information about the Connectionists mailing list