obvious, but false

Sun Feb 25 13:00:36 EST 1996

In response to may email about a network that learns shift invariance, Dorffner
says:

> there seems to be a misunderstanding of what the topic of discussion
> is here. I don't think that Jerry meant that no model consisting of neural
> network components could ever learn shift invariance. After all, there are
> many famous examples in visual recognition with neural networks (such as
> the Neocognitron, as Rolf W"urtz pointed out), and if this impossibility
> were the case, we would have to give up neural network research in 
> perceptual modeling altogether.
> 
> What I think Jerry meant is that any cascade of fully-connected feed-forward
> connection schemes between layers (including the perceptron and the MLP) 
> cannot learn shift invariance. 
> Now besides being obvious, this does raise some
> important questions, possibly weakening the fundamentals of connectionism.

I agree that this is what Jerry meant.  What Jerry said was actually very
reasonable.  He did NOT say it was obviously impossible. He just said that it
was generally understood to be impossible and he would like to see a proof.  I
think Jerry was right in the sense that most people I have talked to believed
it to be impossible.  I'd like to apologize to Jerry for the antagonistic tone
of my previous message.  Dorffner takes the impossibility for granted.  My
simulation conclusively demonstrates that translation invariance can be
learned with no built in bias towards translation invariance.  The only
requirement is that the shapes should share features, and this is a
requirement on the data, not on the network. At the risk of looking very
silly, I bet that it really cannot be done if shapes do not share features.

My simulation did not have built in preprocessing or weight-sharing as
Dorffner seems to imply.  So, unlike the neocognitron, it had no innate bias
towards translation invariance.  It got the "raw" retinal inputs and its
desired outputs were shape identities.  The version with local connectivity
worked best, but as I pointed out, it also worked without local connectivity.
So that version exactly fitted Dorffner's definition of what cannot be done.

Geoff

PS: As I noted in the paper and others have pointed out in their responses,
Minsky and Papert's group invariance theorem really does prove that this task
cannot be done without hidden layers (using conventional units).