shift invariance

Mon Feb 26 15:35:05 EST 1996

Jerry Feldman writes:
> Shift invariance is the ability  of a neural system to recognize a pattern
>independent of where appears on the retina. It is generally understood that
>this property can not be learned by neural network methods,

I agree with Jerry that connectionist networks cannot actually learn shift
invariance.  A connectionist network can exhibit shift invariant behavior by
either being exhaustively trained on a set of patterns that happens to have
this property, or by having shift invariance somehow wired into the network
before the learning occurs.  However, neither of these situation constitutes
"learning shift invariance".  On the other hand, we are still left with the
problem of explaining shift invariant behavior.  Some of the responses so
far imply training on all patterns in all positions (exhaustive training).
I don't find this approach interesting, since it doesn't address the basic
issue of generalization ability.  Thus the question seems to be how shift
invariance can be wired in while still using a learning rule that is local,
biologically plausible, etc.

Geoff Hinton writes:
>shift invariance can be learned by backpropagation. It was one of the
>problems that I tried when fiddling about with backprop in the mid 80's.

Geoff Hinton's experiment clearly does not train the network exhaustively on
all patterns in all positions, so (by the above argument) I have to claim
that shift invariance is wired in.  The network does not use weight sharing,
which would be the most direct way of wiring in shift invariance.  However,
it does appear to use "error sharing".  As I understood Geoff's description,
the weights between the position-dependent and position-independent hidden
layers are not modified by learning.  Each position-independent feature
detector is connected to all its associated position-dependent feature
detectors with links of the same weight (in particular, they are ORed).
Using backprop, this has the effect of distributing the same error signal to
each of these position-dependent feature detectors.  Thus they all tend to
converge to the same feature.  In this way, fixing the weights between the
two hidden layers to equal values makes the feature detectors learned in one
position tend to generalize to other positions.  I suspect "tend to" may be
an important caveat here, but in essence the equivalence of all positions
has been wired in.  This equivalence is simply shift invariance.  On the
other hand, the learning rule is still local, so in that sense it does seem
to meet Jerry's challenge.

Jerry Feldman also writes:
> The one dimensional case of shift invariance can be handled by treating
>each string as a sequence and learning a finite-state acceptor. But the
>methods that work for this are not local or biologically plausible and
>don't extend to two dimensions.

I have to disagree with this dismissal of recurrent networks for handling
shift invariance.  I'm not in a position to judge biological plausibility,
but I would take issue with the claim that methods of training recurrent
networks are nonlocal and can't be generalized to higher dimensions.  These
learning methods are to some extent temporally nonlocal, but some degree of
temporal nonlocality is necessary for any computation that extends over
time.  The important thing is that they are just as spatially local as the
feedforward methods they are based on.  Jerry's own definition of locality
is spatial locality:
>A "local" learning rule is one that updates the input weights of a unit as a
>function of the unit's own activity and some performance measure for the
>network on the training example.

Now finally I get to my primary gripe.  Contrary to Jerry's claim, learning
methods for recurrent networks can be generalized to more than one
dimension.  The issues for two dimensions are entirely the same as those for
one.  All that is necessary to extend recurrence to two dimensions is units
that pulse periodically.  In engineering terms, a single network is
time-multiplexed across one dimension while being sequenced across the
other.  Conceptually, learning can be done by unfolding the network over one
time dimension, then unfolding the result over the other time dimension,
then using a feedforward method.  

The idea of using time to represent two different dimensions has in fact
already been proposed.  At an abstract level, this dual use of the time
dimension is the core idea behind temporal synchrony variable binding (TSVB)
(Shastri and Ajjanagadde, 1993).  Recurrent networks use the time dimension
to represent position in the input sequence (or computation sequence).  TSVB
also uses the time dimension to represent variables.  Because the same
network is being used at every point in the input sequence, recurrent
networks inherently generalize things learned in one input sequence position
to other input sequence positions.  In this way shift invariance is "wired
in".  Exactly analogously, because the same network is being used for every
variable, TSVB networks inherently generalize things learned using one
variable to other variables.  I argue in (Henderson, submitted) that this
results in a network that exhibits systematicity.  Replacing the labels
"sequence position" and "variable" with the labels "horizontal position" and
"vertical position" does not change this basic ability to generalize across
both dimensions.  Work on applying learning to TSVB networks is being done
by both Shastri and myself.  (Note that this description of TSVB is at a
very abstract level.  Issues of biological plausibility are addressed in
(Shastri and Ajjanagadde, 1993) and the papers cited there.)

Shastri, L. and Ajjanagadde, V. (1993).  From simple associations to
systematic reasoning: A connectionist representation of rules, variables,
and dynamic bindings using temporal synchrony.  Behavioral and Brain
Sciences, 16:417--451.

Henderson, J. (submitted). A connectionist architecture with inherent
systematicity.  Submitted to the Eighteenth Annual Conference of the
Cognitive Science Society.

					- Jamie

------------------------------
Dr James Henderson
Department of Computer Science
University of Exeter
Exeter EX4 4PT, U.K.
------------------------------