NFL Summary

Fri Dec 8 19:27:29 EST 1995

Hi Everyone,

There has been a lot of confusion regarding the "No Free Lunch" theorems.
Below, I try to summarize what I feel to be the key points.

NFL in a Nutshell:
------------------
   If you make no assumptions about the target function then on average,
   all learning algorithms will have the same generalization performance.

Apparent Contradiction and Resolution:
--------------------------------------
   Contradiction: Lots of theoretical results regarding generalization
   claim to make no assumptions about the target function.
   Resolution: These theoretical results DO make assumption (which may
   or may no be explicit) regarding the target.

Importance of NFL:
------------------
   The NFL results in and of itself is not terribly interesting because
   it's assumption (that we make no assumptions) is NEVER true.

   What makes NFL important is that it emphasizes in a very striking way
   that it is the ASSUMPTIONS that we make about our learning domains
   that MAKE ALL THE DIFFERENCE.

   Therefore, I see NFL *NOT* as a criticism of theoretical generalization
   results; but rather, as a call to examine the assumptions underlying
   these results because it is there that we can potentially learn the
   most about machine learning.

Examples of Unstated Assumptions:
---------------------------------
   In practise, there are numerous assumption that we as a community
   usually make when we attempt to learn a task using out favorite
   algorithm.  Below, I list just a few obvious ones.

   1) The training and testing data are IID.
   2) The data distribution is "smooth" (i.e. "near" data points are
      in general more similar than "far" data points).  This can also be
      interpreted as some differentiability conditions.
   3) NN's approximate real-world functions reasonably well.
   4) Starting with small intial weights is good.
   5) Overfitting is bad - early stopping is good.
   6) Gaussian error models are the best thing since machine sliced bread.

REALLY INTERESTING STUFF:
-------------------------
   I think that the NFL results point towards what I feel are extremely
   interesting research topics:

      Exactly what are the assumptions that certain theoretical results
        require?
      Exactly how do these assumptions affect generalization?
      Which assumptions are necessary/sufficient?
      How do different assumptions compare?
      Can we identify a set of assumptions that are equivalent to the
        assumption that CV model selection improves generalization?
      Can we do the same for early stopping?  Bagging?
        (You can be damn sure I can do this for averaging... :-)
      Etc, etc, ...

Caveat:
-------
   All of the above is conditioned on the assumptions that David Wolpert
   did his math correctly when deriving the NFL theorems...  :-)

I hope all of this helps clear things up.
Comments?

Regards,
Michael
-------------------------------------------------------------------------
   Michael P. Perrone                          914-945-1779 (office)
   IBM - Thomas J. Watson Research Center      914-945-4010 (fax)
   P.O. Box 704 / Rm 36-207                    914-245-9746 (home)
   Yorktown Heights, NY 10598                  mpp at watson.ibm.com
-------------------------------------------------------------------------