interpolation vs. generalization

David Wolpert dhw at t13.Lanl.GOV
Fri Sep 6 11:20:11 EDT 1991


Thomas Hildebrandt and I probably agree on all important points, but
simply emphasize different aspects of the problem. However,

	1) One must be very careful about talking about "basis sets". There
are many generalizers (e.g., nearest neighbor generalizers) which
can not be expressed as forming a linear combination of basis functions
in the usual Taylor decomposition manner. In fact, one can prove the
following:

	Assume that we want a generalizer which obeys all the invariances
	of Euclidean space. For example, if the entire training set is
	translated in input space, then the guessed hypothesis function
	should be translated in the same manner. (We also want scaling invariance,
	rotation invariance, etc.) There are many such generalizers. However
	there does not exist a basis set of functions such that a generalizer
	which works by fitting a linear combination of the basis functions
	to the elements of the training set obeys those invariances.

There is an exception to this theorem: if the input space is one-dimensional,
then there *is* such a basis set of functions, but only one. Those functions
are just the monomials.

	2) Hildebrandt says: "We may consider problems for which there is no
simple transformation from the input (sensor) space into a (linear)
metric space to be "hard" problems, in a sense."
	Well, yes and no. For some generalizers, namely those which
assume such a metric space, such problems are hard. For other generalizers they
are not. A simple example is fan generalizers, which can be viewed as
the multi-dimensional generalization of the non-linear time-series
technique of "embedding" a time series in a delay space. For such
generalizers, even target functions which are extremely volatile can
be generalized exceedingly well.

	3) Hildebrandt also says that "Discrete problems, which naturally
inhibit interpolation, must be handled by table look-up, i.e. each case
treated separately.  However, table look-up can be considered to be an
extreme case of interpolation -- the transition between one recorded
data point and a neighboring one being governed by a Heaviside
(threshold) function rather than a straight line.
	Yes, you can view the behavior of any generalizer as "interpolation",
if you stretch the meaning of the term sufficiently. My point is that
viewing things that way for some problems (not all) amounts to a
not-very-informative tautology. For some problems, the appropriate
response to such a view is "well, yes, it is, technically speaking,
interpolation, but so what? What does such a perspective gain you?
	Again, I *am* very sympathetic to the interpolation model. There
are many advantages of memory-based reasoning over conventional neural
nets, and (to my mind at least) not many advantages of neural nets
over memory-based reasoning. But that doesn't mean that "interpolation"
is the end of the story and we can all go home now.

	4) "If nothing is known about the process to be modelled, is there any
more efficient way to select a basis than trial-and-error?"
	I'm not sure what you mean by "trial-and-error". Do you mean cross-
validation? If so, then the answer is yes, there are techniques better
than cross-validation. Cross-validation is a winner-takes-all strategy,
in which one picks a single generalizer and uses it. One can instead use
stacked generalization, in which one *combines* generalizers non-linearly.

	For any who are interested, I can mail reprints of mine which discuss
some of the points above.



David Wolpert (dhw at tweety.lanl.gov)


More information about the Connectionists mailing list