NFL Summary
Michael Perrone
mpp at watson.ibm.com
Fri Dec 8 19:27:29 EST 1995
Hi Everyone,
There has been a lot of confusion regarding the "No Free Lunch" theorems.
Below, I try to summarize what I feel to be the key points.
NFL in a Nutshell:
------------------
If you make no assumptions about the target function then on average,
all learning algorithms will have the same generalization performance.
Apparent Contradiction and Resolution:
--------------------------------------
Contradiction: Lots of theoretical results regarding generalization
claim to make no assumptions about the target function.
Resolution: These theoretical results DO make assumption (which may
or may no be explicit) regarding the target.
Importance of NFL:
------------------
The NFL results in and of itself is not terribly interesting because
it's assumption (that we make no assumptions) is NEVER true.
What makes NFL important is that it emphasizes in a very striking way
that it is the ASSUMPTIONS that we make about our learning domains
that MAKE ALL THE DIFFERENCE.
Therefore, I see NFL *NOT* as a criticism of theoretical generalization
results; but rather, as a call to examine the assumptions underlying
these results because it is there that we can potentially learn the
most about machine learning.
Examples of Unstated Assumptions:
---------------------------------
In practise, there are numerous assumption that we as a community
usually make when we attempt to learn a task using out favorite
algorithm. Below, I list just a few obvious ones.
1) The training and testing data are IID.
2) The data distribution is "smooth" (i.e. "near" data points are
in general more similar than "far" data points). This can also be
interpreted as some differentiability conditions.
3) NN's approximate real-world functions reasonably well.
4) Starting with small intial weights is good.
5) Overfitting is bad - early stopping is good.
6) Gaussian error models are the best thing since machine sliced bread.
REALLY INTERESTING STUFF:
-------------------------
I think that the NFL results point towards what I feel are extremely
interesting research topics:
Exactly what are the assumptions that certain theoretical results
require?
Exactly how do these assumptions affect generalization?
Which assumptions are necessary/sufficient?
How do different assumptions compare?
Can we identify a set of assumptions that are equivalent to the
assumption that CV model selection improves generalization?
Can we do the same for early stopping? Bagging?
(You can be damn sure I can do this for averaging... :-)
Etc, etc, ...
Caveat:
-------
All of the above is conditioned on the assumptions that David Wolpert
did his math correctly when deriving the NFL theorems... :-)
I hope all of this helps clear things up.
Comments?
Regards,
Michael
-------------------------------------------------------------------------
Michael P. Perrone 914-945-1779 (office)
IBM - Thomas J. Watson Research Center 914-945-4010 (fax)
P.O. Box 704 / Rm 36-207 914-245-9746 (home)
Yorktown Heights, NY 10598 mpp at watson.ibm.com
-------------------------------------------------------------------------
More information about the Connectionists
mailing list