No subject

Tue Sep 10 13:06:26 EDT 1991

David Wolpert writes

 "A very good example. In fact, any generalizer which acts in a somewhat
 local manner (i.e., looks mostly at nearby elements in the learning
 set) has the nice property that for the parity problem, the
 larger the training set the *worse* the generalization off of that
 training set, for precisely the reason Dr. Ghahramani gives."

His definition of "somewhat local" would seem to include K nearest neighbors.
Curiously, this can do a good job of generalizing parity from randomly
distributed examples IFF we use generalization on a validation set to
determine the optimal value of K.  Each n-bit input vector has n neighbors
that differ on 1 bit, but order n^2 neighbors that differ on two bits (and
have the same parity). So the generalization to a validation set will be best
for K small and even.

He also writes

 "Interestingly, for small (number of bits small) versions of the
 parity problem, backprop has exactly this property; as the learning
 set grows, so does its error rate off of the learning set."

In fact, backprop generalizes parity rather well using n hidden units,
provided the number of connections (n+1)^2  is considerably smaller than the
number of training examples.  For small n, (n+1)^2 is smaller than
2^n, so its not surprising it doesnt generalize.  For n=10, it generalises
very well from 512 training examples to the rest (only 2 or 3 errors).

Geoff