"Universal Approximators"

Tue Nov 14 12:30:26 EST 1989

YlC> Yann le Cun
JM> John Merrill (me)

JM> In fact, the radial basis function theorem gives somewhat better
JM> bounds on the *number* of intermediate nodes necessary, and, as a
JM> consequence, indicates that if you're only interested in
JM> approximation, you want to use RBF's.

YlC> But you can build RBF's (or rather, multidimensional "bumps")
YlC> with two layers of sigmoid units. And you just need O(n) units
YlC> for n bumps.  A third (linear) layer can sum up the bumps.

YlC> So i guess the bound on the number of intermediate nodes is the
YlC> same in both cases.

Not quite.  If your input lies in ${\bf R}^k$, it takes at least k units
(in the lower hidden level) to build a single k-dimensional bump in
the upper hidden level.(*) As a consequence, the network that this
argument gives is wildly more computationally demanding than the
original RBF network, since it's got to have $nk^2$ edges between the
input layer and the first hidden layer, as well as $nk$ more edges
between the two hidden layers, for a total edge count of $n (k^2 +
k)$, as compared to an edge count of $nk$ for the RBF network.  If $k
= 1000$, this is a factor of $1000$ more costly (in terms of edges;
it's actually less than that, since each RBF node needs an extra few
multiplies.)

YlC> You can prove the universality of 2 hidden layer structures that
YlC> way (and the proof is a lot simpler than with 1 hidden layer).

Absolutely, and it's the one I would actually teach if I taught one at
all.

--- John

(*) It might seem that one would need 2k such units; in fact, by using
k-dimensional simplices as the bump-shapes, k units suffice.