No subject

Geoffrey Hinton hinton at cs.toronto.edu
Mon Mar 15 13:00:18 EST 1993


Wolpert says:

>All "local" algorithms (e.g., weighted nearest neighbor) perform
>badly on parity. More precisely, as the number of training examples
>goes up, their off-training set generalization *degrades*, asymptoting
>at 100% errors. This is also true for backprop run on neural nets, at
>least for n = 6.

This seems very plausible but its not quite right.  

First, consider K nearest neighbors, where a validation set is used to pick K
and all K neighbors get to vote on the answer.  It seems fair to call this a
"local" algorithm.  If the training data contains a fraction p of all the
possible cases of n-bit parity, then each novel test case will have about pn
neighbors in the training set that differ by one bit.  It will also have about
pn(n-1)/2 training neighbors that differ by 2 bits, and so on.  So for
reasonably large n we will get correct performance by picking K so that we
tend to get all the training cases that differ by 1 bit and most of the far
more numerous training cases that differ by 2 bits. For very small values of p
we need to consider more distant neighbors to get this effect to work, and
this requires larger values of n.

Second, backprop generalizes parity pretty well (on NOVEL test cases) provided
you give it a chance.  If we use the "standard" net with n hidden units (we
can get by with less) we have (n+1)^2 connections.  To get several bits of
constraint per connection we need p 2^n >> (n+1)^2 where p is the fraction of
all possible cases that are used for training.  For n=6 there are only 64
possible cases and we use 49 connections so this isnt possible.  For n=10, if
we train on 512 cases we only get around 3 or 4% errors on the remaining
cases.  Of course training is slow for 10 bit parity: About 2000 epochs even
with adaptive learning rates on each connection (and probably a million epochs
if you choose a bad enough version of backprop.)

Neither of these points is intended to detract from the second point that
Wolpert makes.  There are indeed other very interesting learning algorthms
that do very well on parity and other tasks.

Geoff



More information about the Connectionists mailing list