needed: complexity analyses of NN & evolutionary learning systems
Yann le Cun
yann at ai.toronto.edu
Fri Jul 22 14:22:00 EDT 1988
> I am looking for proofs of complexity limits for tasks learnable by
> (multilayer) PDP algorithms. Specifically, the proof that the
> generalized delta rule (aka backprop) constrains one to linearly
> independent association tasks.
Your question is a little ambiguous.
If your question is about the computational power of multi-layer networks
(independantly of the learning algorithm), then it is very easy to show that
a network of sigmoid units with 2 intermediate layers can approximate
ANY continuous vector function (from R^n to R^m ) as close as you want,
provided that you put enough units in the hidden layers (the proof is in
my thesis, but really, it is trivial).
The proof is constructive, but, as always, the resulting network has no
(or little) practical interest since the number of hidden units can be
prohibitively large.
Surprisingly, for a function of a single variable, you just need one hidden
layer.
Any classification function on a finite set of patterns
is computable with two hidden layers.
Now, if your question is about the limitations of back-prop itself (as a
learning algorithm), there is not much we know about that.
I suspect that your question had to do with SINGLE LAYER networks.
Usual single layer networks are restricted to *linearly separable functions*.
There is a nice theorem by Cover (IEEE trans. electronic computer,
vol.EC14(3) 1965) which gives the probability that a dichotomy on a set
of patterns is linearly separable.
Even if the desired dichotomy IS linearly separable, the delta rule (or
Widrow-Hoff rule), which only works for single layer nets, will not
necessarily find a solution. The Perceptron rule will.
Yann le Cun. Dept of Computer Science, University of Toronto.
yann at ai.toronto.edu
More information about the Connectionists
mailing list