No subject
Geoffrey Hinton
hinton at ai.toronto.edu
Tue Sep 10 13:06:26 EDT 1991
David Wolpert writes
"A very good example. In fact, any generalizer which acts in a somewhat
local manner (i.e., looks mostly at nearby elements in the learning
set) has the nice property that for the parity problem, the
larger the training set the *worse* the generalization off of that
training set, for precisely the reason Dr. Ghahramani gives."
His definition of "somewhat local" would seem to include K nearest neighbors.
Curiously, this can do a good job of generalizing parity from randomly
distributed examples IFF we use generalization on a validation set to
determine the optimal value of K. Each n-bit input vector has n neighbors
that differ on 1 bit, but order n^2 neighbors that differ on two bits (and
have the same parity). So the generalization to a validation set will be best
for K small and even.
He also writes
"Interestingly, for small (number of bits small) versions of the
parity problem, backprop has exactly this property; as the learning
set grows, so does its error rate off of the learning set."
In fact, backprop generalizes parity rather well using n hidden units,
provided the number of connections (n+1)^2 is considerably smaller than the
number of training examples. For small n, (n+1)^2 is smaller than
2^n, so its not surprising it doesnt generalize. For n=10, it generalises
very well from 512 training examples to the rest (only 2 or 3 errors).
Geoff
More information about the Connectionists
mailing list