needed: complexity analyses of NN & evolutionary learning systems

Fri Jul 22 14:22:00 EDT 1988

> I am looking for proofs of complexity limits for tasks learnable by
> (multilayer) PDP algorithms.  Specifically, the proof that the
> generalized delta rule (aka backprop) constrains one to linearly
> independent association tasks.

Your question is a little ambiguous.
If your question is about the computational power of multi-layer networks
(independantly of the learning algorithm), then it is very easy to show that
a network of sigmoid units with 2 intermediate layers can approximate
ANY continuous vector function (from R^n to R^m ) as close as you want, 
provided that you put enough units in the hidden layers (the proof is in 
my thesis, but really, it is trivial). 
The proof is constructive, but, as always, the resulting network has no
(or little) practical interest since the number of hidden units can be
prohibitively large.
Surprisingly, for a function of a single variable, you just need one hidden
layer. 
Any classification function on a finite set of patterns
is computable with two hidden layers.

Now, if your question is about the limitations of back-prop itself (as a
learning algorithm), there is not much we know about that.

I suspect that your question had to do with SINGLE LAYER networks.
Usual single layer networks are restricted to *linearly separable functions*.
There is a nice theorem by Cover (IEEE trans. electronic computer, 
vol.EC14(3) 1965) which gives the probability that a dichotomy on a set 
of patterns is linearly separable.

Even if the desired dichotomy IS linearly separable, the delta rule (or
Widrow-Hoff rule), which only works for single layer nets,  will not 
necessarily find a solution. The Perceptron rule will.

Yann le Cun.   Dept of Computer Science, University of Toronto.
yann at ai.toronto.edu