Catastrophic forgetting and sequential learning

Thu Dec 15 14:28:01 EST 1994

Below I have indicated a number of references on current work on the
problem of sequential learning in connectionist networks.  All of these
papers address the problem of catastrophic interference, which may
result when a previously trained connectionist network attempts to
learn new patterns.
   The following commented list is by no means complete.  In
particular, no mention is made of convolution-correlation models with
their associated connectionist networks.  Nonetheless, I hope it
might prove to be a useful introduction to people interested in
knowing a bit more about the subject.

				Bob French
                                french at cogsci.indiana.edu

----------------------------------------------------------------------------

                   Recent Work in Catastrophic Interference 
                          in Connectionist Networks

The two papers that really kicked off research in this area were:

McCloskey, M. & Cohen, N. (1989)  "Catastrophic interference in connections
	networks: the sequential learning problem" The Psychology of 
	Learning and Motivation, 24, 109-165.

Ratcliff, R (1990) "Connectionist models of recognition memory: constraints
	imposed by learning and forgetting functions" Psychological Review,
	97, 285-308

Hetherington and Seidenberg very early suggested an Ebbinghaus-like
"savings" measure of catastrophic interference and, based on this,
they concluded that catastrophic interference wasn't really as much of
a problem as had been thought. While the problem has, in fact, been
subsequently confirmed to be quite serious, the "savings" measure they
proposed is still widely used to measure the extent of forgetting.

Hetherington, P.. & Seidenberg, M. (1989) "Is there 'catastrophic
	interference' in connectionist networks?"  Proceedings of the
	11th Annual Conference of the Cognitive Science Society. 
	Hillsdale, NJ: Erlbaum, 26-33.

Kortge was one of the first to propose a solution to this problem,
using what he called "novelty vectors".

Kortge, C. (1990) "Episodic Memory in Connectionist Networks" Proceedings 
	of the 12th Annual Conference of the Cognitive Science Society. 
	Hillsdale, NJ: Erlbaum, 764-771

Sloman and Rumelhart also developed a technique called "episodic gating"
designed to reduce the severity of the problem.

Sloman, S. & Rumelhart, D., (1991) "Reducing interference in distributed
	memories through episodic gating" In Healy, Kosslyn, & Shiffrin (eds.)
	Essays in Honor of W. K. Estes.

In 1991 French suggested that catastrophic interference might be the 
inevitable price you pay for the advantages of fully distributed
representations (in particular, generalization).  He suggested a way
of dynamically producing "semi-distributed" hidden-layer
representations to reduce the effect of catastrophic interference.

French, R. (1991) "Using semi-distributed representations to overcome 
	catastrophic forgetting in connectionist networks" in Proceedings 
	of the 13th Annual Conference of the Cognitive Science Society. 
	Hillsdale, NJ: Erlbaum, 173-178.

A more detailed article presenting the same techique, called
activation sharpening, appeared in Connection Science.

French, R. (1992) "Semi-distributed Representations and Catastrophic
	Forgetting in Connectionist Networks", Connection Science,
	Vol. 4: 365-377.

In a more recent paper (1994), French presented a technique called
context biasing, which again dynamically "massages" hidden layer
representations based on the "context" of other recently learned
exemplars.  The goal of this technique is to produce hidden-layer
representations that are simultaneously well distributed and
orthogonal.

French, R. (1994) "Dynamically constraining connectionist networks to
	produce distributed, orthogonal representations to reduce
	catastrophic interference" in Proceedings of the 16th Annual 
	Conference of the Cognitive Science Society. Hillsdale, NJ: 
	Erlbaum, 335-340

Finally, still in this vein, French proposed (1993) at the NIPS-93
workshop on catastrophic interference a dynamic system of two
interacting networks working in tandem, one storing prototypes, the
other doing short-term learning of new exemplars. For a "real brain"
justification for this type of architecture see McClelland,
McNaughton, and O'Reilly (1994), below.  (A full paper on this tandem
network architecture will be available at the beginning of next year.)
This technique is discussed very briefly in:

French, R. (1994) "Catastrophic interference in connectionist networks:
	Can it be predicted, can it be prevented?" Neural Information
	Processing Systems - 6, Cowan, Tesauro, Alspector (eds.) 
	San Francisco, CA: Morgan Kaufmann. 1176-1177.

Steve Lewandowsky has also been very active in this area.  He
developed a simple technique in 1991 that focused on producing
orthogonalization at the input layer rather than the hidden layer.
This "symmetric vectors" technique is discussed in:

Lewandowsky, S. & Shu-Chen Li (1993) "Catastrophic Interference in
	Neural Networks: Causes, solutions and data" in New 
	Perspectives on interference and inhibition in cognition.
	Dempster & Brainerd (eds.) New York: NY: Academic Press.

Lewandowsky, S (1991) "Gradual unlearning and catastrophic
	interference: a comparison of distributed architectures.
 	In Hockley & Lewandowsky (eds.) Relating theory and
	data: Essays on human memory in honor of Bennet B.
	Murdock. (pp. 445-476).  Hillsdale, NJ: Lawrence Erlbaum.

and in an earlier University of Oklahoma psychology department
technical report:

Lewandowsky, S. (1993) "On the relation between catastrophic
	interference and generalization in connectionist networks"

In 1993 McRae and Hetherington published a study using pre-training to
eliminate catastrophic interference.

McRae, K. & Hetherington, P. (1993) "Catastrophic interference is 
	eliminated in pretrained networks" in Proceedings of the
	15th Annual Cognitive Science Society.  Hillsdale, NJ: 
	Erlbaum. 723-728

John Kruschke discussed the problem of catastrophic forgetting at
length in the context of his connectionist model, ALCOVE, and showed
the extent to which and under what circumstances this model is not
subject to catastrophic forgetting.

Kruschke, J. (1993) "Human category learning: implications for
	backpropagation models", Connection Science, Vol. 5, No. 1, 

Jacob Murre has also examined how his model, CALM, performs on
the sequential learning problem.    See, in particular:

Murre, J. Learning and Categorization in Modular neural networks
	Hillsdale, NJ: Lawrence Erlbaum. 1992. (see esp. ch. 7.4)

It is to be noted that both ALCOVE and CALM rely, at least in part,
on reducing the distributedness of their internal representations in
order to achieve improved performance on the problem of catastrophic
interference.

A 1994 article (in press) by Anthony Robins presents a novel
technique, called "pseudorehearsal", whereby "pseudoexemplars" that
reflect prior learning are added to the new data set to learned in
order to reduce catastrophic forgetting.

Robins, A. "Catastrophic forgetting, rehearsal, and pseudorehearsal",
	University of Otago (New Zealand) computer science technical
	report.  (copies:  coscavr at otago.ac.nz)

Tetewsky, Shultz & Buckingham (1994) demonstrate the improvements that
result from using Fahlman's cascade-correlation learning algorithm.

Tetewsky, S., Shultz, T. and Buckingham, D. "Assessing interference and
	savings in connectionist models of human recognition memory"
	Department of Psychology TR, McGill University, Montreal.
	(presented at 1994 Meeting of the Psychonomic Society).

Sharkey & Sharkey (1993) discussed the relation between the problem of
interference and discrimination in connectionist networks.  They
conclude that sequentially trained networks using backprop will
unavoidably suffer from one or the other problem.  I am not aware if
there is a final version of this paper in print yet, but Noel Sharkey
is currently at University of Sheffield, Dept. of Computer Science,
Sheffield.  n.sharkey at dcs.shef.ac.uk

Sharkey, N. & Sharkey, A., "An interference-discrimination tradeoff
	in connectionist models of human memory"  

McClelland, McNaughton, & O'Reilly issued a CMU technical report
earlier this year (1994) in which they discuss the phenomenon of
catastrophic interference in the context of the "real world", i.e.,
the brain.  They suggest that the complementary learning systems in
the hippocampus and the neocortex might be the brain's way of 
overcoming the problem.  They argue that this dual system provides 
a means not only of rapidly acquiring new information, but also of 
storing well-learned information as prototypes.

McClelland, J., McNaughton, B., & O'Reilly, R. "Why there are
	complementary learning systems in the hippocampus and 
	neocortex:  Insights from the successes and failures
	of connectionist models of learning and memory"  CMU
	Tech report: PNP.CNS.94.1, March 1994.

-------------------------------------------------------------------------