sequential inteference

Wed Dec 21 12:50:03 EST 1994

In response to recent comments by Neil Burgess, Bob French, Phil
Hetherington, Noel Sharkey, Jay McClelland and others on 'catastrophic
interference', I think it is important to establish that 'vanilla
backpropagation' has by now been eliminated as a valid model of human
learning and memory, its implausible learning transfer being the most
obvious failure. An important reason for this failure can be found in the
nature of the hidden layer representations.
      Catastrophic interference and hypertransfer (i.e., excessive positive
transfer, see Murre, in press a) are both sides of the same coin: hidden-
layer representations in backpropagation bear little relationship to
orthogonalities of input patterns. In the case of interference experiments in
humans the following result is well established: If the input patterns
(stimuli) in two consecutive learning sets A and B are different than there
will be neither interference nor positive transfer (e.g., Osgood, 1949). This
is not the case in backpropagation: orthogonality of the stimuli has no
effect on the reduction of interference. The reason for this is that the
hidden-layer representations are always about equally overlapping *no
matter what the input stimuli are*. Interference and transfer in
backpropagation are only psychologically implausible with respect to its
indifference towards the structure of the input stimuli. Surprisingly enough,
two-layer networks (i.e., with a normal delta-rule, with one layer of
weights) do not suffer from this problem and are in fact well able to model
human interference (see Murre, in press a).
      Having said this, it is also important to consider the various variant
models of error-correcting learning that have been developed recently and
that use backpropagation as a starting point. These models have very
plausible characteristics with respect to learning and categorization in
humans: Kruschke (1990), Gluck (1991), Gluck and Bower (1988, 1990),
Nosofsky, Kruschke, and McKinley (1992), Shanks and Gluck (1994). In
addition, there are now many variant models of backpropagation that do
not suffer from catastrophic interference. These have already been
mentioned in this discussion, so I will not repeat them here.
      From a biological point of view backpropagation is certainly
implausible but it has been show useful in inferring biological plausible
parameters (e.g., Lockery et al., 1989; Zipser and Anderson, 1988).

I can think of several reasons why backpropagation continues to attract
researchers:

1. It is easy to understand.

2. Many simulators are available (see Murre, in press b).

3. It has been shown to approximate all 'well-behaved' functions (e.g.,
Hornik, Stinchcombe, and White, 1989) and thus is often felt to qualify as
a generic learning mechanism. In particular, it can learn non-linearly
separable pattern sets.

4. It has only a few parameters and these are not very critical for the final
results.

5. It possesses most of the basic elements of a 'prototypical' neural
network: distributed representations, graceful degradation, pattern
completion, and adequate generalization of learned behavior. 

I do not myself think that these reasons form necessarily a sufficient
motivation for using backpropagation, and I indeed prefer to work on other
types of learning methods and neural networks. Perhaps, by investigating
the limitations of backpropagation, necessary minimal improvements may
become clear, so that we can replace it - in its leading role - by either a
more plausible variant algorithm, or by a completely different learning
method. 

Merry Christmas,

                      -- Jaap Murre

References

Gluck, M.A. (1991). Stimulus generalization and representation in adaptive
      network models of category learning. Psychological Science, 2, 50-
      55.
Gluck, M.A., & G.H. Bower (1988). From conditioning to category
      learning: an adaptive network model. Journal of Experimental
      Psychology: General, 117, 227-247.
Gluck, M.A., & G.H. Bower (1990). Component and pattern information in
      adaptive networks, Journal of Experimental Psychology: General,
      119, 105-109.
Kruschke, J.K. (1990). ALCOVE: an exemplar-based connectionist model
      of category learning. Psychological Review, 99, 22-44.
Lockery, S.R., G. Wittenberg, W.B. Kristan, Jr., & Garrison W. Cottrell
      (1989). Function of identified interneurons in the leech elucidated
      using neural networks trained by back-propagation. Nature, 340, 468-
      471.
Murre, J.M.J. (in press a). Transfer of learning in backpropagation and in
      related neural network models. In: J. Levy, D. Bairaktaris, J.
      Bullinaria, & P. Cairns (Eds.), Connectionist Models of Memory and
      Language. London: UCL Press. (In our ftp site:
      ftp://ftp.mrc-apu.cam.ac.uk/pub/nn/murre/hyper1.ps) 
Murre, J.M.J., (in press b). Neurosimulators. In: M.A. Arbib (Ed.),
      Handbook of Brain Research and Neural Networks, Cambridge, MA:
      MIT Press. (In our ftp site:
      ftp://ftp.mrc-apu.cam.ac.uk/pub/nn/murre/neurosim1.ps).
Nosofsky, R.M., J.K. Kruschke, and S. C. McKinley (1992). Combining
      exemplar-based category representations and connectionist learning
      rules. Journal of Experimental Psychology: Learning, Memory, and
      Cognition, 18, 211-233.
Osgood, C.E. (1949). The similarity paradox in human learning: a
      resolution. Psychological Review, 56, 132-143.
Shanks, D.R., & M.A. Gluck (1994). Tests of an adaptive network model
      for the identification and categorization of continuous-dimension
      stimuli. Connection Science, 6, 59-89.
Zipser, D., & R.A. Anderson (1988). A back-propagation programmed
      network that simulates response properties of a subset of posterior
      parietal neurons. Nature, 331, 679-684.