"Universal Approximators"

Tue Nov 14 17:15:25 EST 1989

John Merrill says:
  >If your input lies in ${\bf R}^k$, it takes at least k units
  >(in the lower hidden level) to build a single k-dimensional bump in
  >the upper hidden level

True, although, as you say,  it is easier with 2k units.

  >As a consequence, the network that this
  >argument gives is wildly more computationally demanding than the
  >original RBF network, since it's got to have $nk^2$ edges between the
  >input layer and the first hidden layer

Not true, since each of these 2k units only needs 2 incoming weights (not k)
one for the bias, and one coming from one of the inputs (*).
thus the total number of edges is 6nk, just 6 times bigger than regular RBF's.
It can even be better than that (almost 2nk) if your bumps are regularly spaced
since they can share the first level units.

And you can back-propagate through the whole thing.

  -- Yann Le Cun

(*) you might want k incoming weights if you absolutely need to have 
    non symetric and rotated RBF's, but otherwise 2 is enough