Learning shift invariance

Fri Mar 1 06:53:04 EST 1996

Dear connectionists,

first of all, thanks to Laurenz Wiskott and Jerry Feldman for arranging 
the arguments and thus giving the discussion a proper fundament.

My view on the matter is the following. The (to me) most interesting 
part is the generalizing ability which Laurenz has named 4b. I would
define the challenge for a neural net to learn shift invariance as
follows.

There are N patterns and P positions. Beginning from tabula rasa, the
network is presented ONE pattern in ALL possible positions to learn
shift invariance. For practical reasons, more than one pattern may be
required, but I would insist that shift invariance has to be learned
from a small subset of the possible patterns.

After having learned shift invariance that way the network should be
able to learn new patterns at a SINGLE position and then recognize
them in an invariant way in ANY position. Again, I would allow a small
number of positions. I grant, that the network is NOW a structured
one.

That is what I would call a satisfactory solution to the problem of
learning shift invariance. The network in Geoffrey Hinton's paper does
a good job, but it fails to meet this requirement. His parameters are
N=16, P=12. Every pattern is trained at 10 (random) positions. So the
number of training examples is 0.83*P*N, the number of test examples 
to which the network generalizes is 0.17*P*N.

This gets a little awkward for larger values of N and P. The task as
outlined above would allow only s*(P+N-1) training examples, where s
is the `small number'. Something like 3 should be appropriate, 1
desirable. Then the network should generalize and recognize all P*N
examples correctly. Note that there is no objection to the choice of
parameters in the paper but to the scaling behavior for larger
parameters. The network must have seen the patterns in almost all
possible positions to do the generalization.

As far as I have followed the discussion the goal of an O(P+N)
dependence of the training set size has not been reached yet. I see 3
possibilities for settling the issue.

1) Construct a network that solves the problem as outlined above.
2) Prove that it can not be done. 
3) Prove (experimentally) that visual perception can not solve the 
   problem.

I am very interested in any progress in one of these directions, and I
am looking forward to the further course of this discussion.

Rolf

+----------------------------------------------------------------------------+
| Rolf P. W"urtz | mailto: rolf at cs.rug.nl | URL: http://www.cs.rug.nl/~rolf/ |
| Department of Computing Science,  University of Groningen, The Netherlands |
+----------------------------------------------------------------------------+