shift invariance

Wed Feb 28 13:00:39 EST 1996

 There seem to be three separate threads arising from my cryptic post and it
might be useful to separate them.

1) The capabilities of spatial feedforward nets and backprop(ffbp)

  Everyone seems to now agree that conventional feedforward nets and backprop
(ffbp) will not learn the simple 0*1010* languages of my posting. Of course
any formal technique has limitations; the interesting point is that shift
invariance is a basic property of apparent biological significance. Geoff
Hinton's series of messages asserts that the world and experience are (could
be?) structured such that ffbp will learn shift invariance in practice because
patterns overlap and are dense enough in space. My inference is that Geoff
would like to extend this claim (the world makes ffbp work) to everything of
biological importance. Results along these lines would be remarkable indeed.

2) Understanding how the visual system achieves shift invariance.

  This thread has been non-argumentative. The problem of invariances and
constancies in the visual system remains central in visual science. I can't
think of any useful message-sized summary, but this is an area where
connectionist models should play a crucial role in expressing and testing
theories. But, as several people have pointed out, we can't expect much from
tabula rasa learning.

3) Shift invariance in time and recurrent networks.

 I threw in some (even more cryptic) comments on this anticipating that some
readers would morph the original task into this form. The 0*1010* problem is
an easy one for FSA induction and many simple techniques might work for this.
But consider a task that is only slightly more general, and much more natural.
Suppose the task is to learn any FSL from the class b*pb* where b and p are
fixed for each case and might overlap. Any learning technique that just
tried to predict (the probability of) successors will fail because there
are three distinct regimes and the learning algorithm needs to learn this.
I don't have a way to characterize all recurrent net learning algorithms to
show that they can't do this and it will be interesting to see if one can.
There are a variety on non-connectionist FSA induction methods that can
effectively learn such languages, but they all depend on some overall measure
of simplicity of the machine and its fit to the data - and are thus non-local.

 The remark about not extending to two dimensions referred to the fact that
we have no formal grammar for two dimensional patterns (although several
proposals for one) and, a fortiori, no algorithm for learning same. One
can, as Jamie Henderson suggests, try to linearize two-dimensional problems.
But no one has done this successfully for shift (rotation, scale, etc.)
invariance and it doesn't seem to me a promising approach to these issues.

Jerry F.