Catastrophic forgetting and sequential learning
Bob French
french at cogsci.indiana.edu
Thu Dec 15 14:28:01 EST 1994
Below I have indicated a number of references on current work on the
problem of sequential learning in connectionist networks. All of these
papers address the problem of catastrophic interference, which may
result when a previously trained connectionist network attempts to
learn new patterns.
The following commented list is by no means complete. In
particular, no mention is made of convolution-correlation models with
their associated connectionist networks. Nonetheless, I hope it
might prove to be a useful introduction to people interested in
knowing a bit more about the subject.
Bob French
french at cogsci.indiana.edu
----------------------------------------------------------------------------
Recent Work in Catastrophic Interference
in Connectionist Networks
The two papers that really kicked off research in this area were:
McCloskey, M. & Cohen, N. (1989) "Catastrophic interference in connections
networks: the sequential learning problem" The Psychology of
Learning and Motivation, 24, 109-165.
Ratcliff, R (1990) "Connectionist models of recognition memory: constraints
imposed by learning and forgetting functions" Psychological Review,
97, 285-308
Hetherington and Seidenberg very early suggested an Ebbinghaus-like
"savings" measure of catastrophic interference and, based on this,
they concluded that catastrophic interference wasn't really as much of
a problem as had been thought. While the problem has, in fact, been
subsequently confirmed to be quite serious, the "savings" measure they
proposed is still widely used to measure the extent of forgetting.
Hetherington, P.. & Seidenberg, M. (1989) "Is there 'catastrophic
interference' in connectionist networks?" Proceedings of the
11th Annual Conference of the Cognitive Science Society.
Hillsdale, NJ: Erlbaum, 26-33.
Kortge was one of the first to propose a solution to this problem,
using what he called "novelty vectors".
Kortge, C. (1990) "Episodic Memory in Connectionist Networks" Proceedings
of the 12th Annual Conference of the Cognitive Science Society.
Hillsdale, NJ: Erlbaum, 764-771
Sloman and Rumelhart also developed a technique called "episodic gating"
designed to reduce the severity of the problem.
Sloman, S. & Rumelhart, D., (1991) "Reducing interference in distributed
memories through episodic gating" In Healy, Kosslyn, & Shiffrin (eds.)
Essays in Honor of W. K. Estes.
In 1991 French suggested that catastrophic interference might be the
inevitable price you pay for the advantages of fully distributed
representations (in particular, generalization). He suggested a way
of dynamically producing "semi-distributed" hidden-layer
representations to reduce the effect of catastrophic interference.
French, R. (1991) "Using semi-distributed representations to overcome
catastrophic forgetting in connectionist networks" in Proceedings
of the 13th Annual Conference of the Cognitive Science Society.
Hillsdale, NJ: Erlbaum, 173-178.
A more detailed article presenting the same techique, called
activation sharpening, appeared in Connection Science.
French, R. (1992) "Semi-distributed Representations and Catastrophic
Forgetting in Connectionist Networks", Connection Science,
Vol. 4: 365-377.
In a more recent paper (1994), French presented a technique called
context biasing, which again dynamically "massages" hidden layer
representations based on the "context" of other recently learned
exemplars. The goal of this technique is to produce hidden-layer
representations that are simultaneously well distributed and
orthogonal.
French, R. (1994) "Dynamically constraining connectionist networks to
produce distributed, orthogonal representations to reduce
catastrophic interference" in Proceedings of the 16th Annual
Conference of the Cognitive Science Society. Hillsdale, NJ:
Erlbaum, 335-340
Finally, still in this vein, French proposed (1993) at the NIPS-93
workshop on catastrophic interference a dynamic system of two
interacting networks working in tandem, one storing prototypes, the
other doing short-term learning of new exemplars. For a "real brain"
justification for this type of architecture see McClelland,
McNaughton, and O'Reilly (1994), below. (A full paper on this tandem
network architecture will be available at the beginning of next year.)
This technique is discussed very briefly in:
French, R. (1994) "Catastrophic interference in connectionist networks:
Can it be predicted, can it be prevented?" Neural Information
Processing Systems - 6, Cowan, Tesauro, Alspector (eds.)
San Francisco, CA: Morgan Kaufmann. 1176-1177.
Steve Lewandowsky has also been very active in this area. He
developed a simple technique in 1991 that focused on producing
orthogonalization at the input layer rather than the hidden layer.
This "symmetric vectors" technique is discussed in:
Lewandowsky, S. & Shu-Chen Li (1993) "Catastrophic Interference in
Neural Networks: Causes, solutions and data" in New
Perspectives on interference and inhibition in cognition.
Dempster & Brainerd (eds.) New York: NY: Academic Press.
Lewandowsky, S (1991) "Gradual unlearning and catastrophic
interference: a comparison of distributed architectures.
In Hockley & Lewandowsky (eds.) Relating theory and
data: Essays on human memory in honor of Bennet B.
Murdock. (pp. 445-476). Hillsdale, NJ: Lawrence Erlbaum.
and in an earlier University of Oklahoma psychology department
technical report:
Lewandowsky, S. (1993) "On the relation between catastrophic
interference and generalization in connectionist networks"
In 1993 McRae and Hetherington published a study using pre-training to
eliminate catastrophic interference.
McRae, K. & Hetherington, P. (1993) "Catastrophic interference is
eliminated in pretrained networks" in Proceedings of the
15th Annual Cognitive Science Society. Hillsdale, NJ:
Erlbaum. 723-728
John Kruschke discussed the problem of catastrophic forgetting at
length in the context of his connectionist model, ALCOVE, and showed
the extent to which and under what circumstances this model is not
subject to catastrophic forgetting.
Kruschke, J. (1993) "Human category learning: implications for
backpropagation models", Connection Science, Vol. 5, No. 1,
Jacob Murre has also examined how his model, CALM, performs on
the sequential learning problem. See, in particular:
Murre, J. Learning and Categorization in Modular neural networks
Hillsdale, NJ: Lawrence Erlbaum. 1992. (see esp. ch. 7.4)
It is to be noted that both ALCOVE and CALM rely, at least in part,
on reducing the distributedness of their internal representations in
order to achieve improved performance on the problem of catastrophic
interference.
A 1994 article (in press) by Anthony Robins presents a novel
technique, called "pseudorehearsal", whereby "pseudoexemplars" that
reflect prior learning are added to the new data set to learned in
order to reduce catastrophic forgetting.
Robins, A. "Catastrophic forgetting, rehearsal, and pseudorehearsal",
University of Otago (New Zealand) computer science technical
report. (copies: coscavr at otago.ac.nz)
Tetewsky, Shultz & Buckingham (1994) demonstrate the improvements that
result from using Fahlman's cascade-correlation learning algorithm.
Tetewsky, S., Shultz, T. and Buckingham, D. "Assessing interference and
savings in connectionist models of human recognition memory"
Department of Psychology TR, McGill University, Montreal.
(presented at 1994 Meeting of the Psychonomic Society).
Sharkey & Sharkey (1993) discussed the relation between the problem of
interference and discrimination in connectionist networks. They
conclude that sequentially trained networks using backprop will
unavoidably suffer from one or the other problem. I am not aware if
there is a final version of this paper in print yet, but Noel Sharkey
is currently at University of Sheffield, Dept. of Computer Science,
Sheffield. n.sharkey at dcs.shef.ac.uk
Sharkey, N. & Sharkey, A., "An interference-discrimination tradeoff
in connectionist models of human memory"
McClelland, McNaughton, & O'Reilly issued a CMU technical report
earlier this year (1994) in which they discuss the phenomenon of
catastrophic interference in the context of the "real world", i.e.,
the brain. They suggest that the complementary learning systems in
the hippocampus and the neocortex might be the brain's way of
overcoming the problem. They argue that this dual system provides
a means not only of rapidly acquiring new information, but also of
storing well-learned information as prototypes.
McClelland, J., McNaughton, B., & O'Reilly, R. "Why there are
complementary learning systems in the hippocampus and
neocortex: Insights from the successes and failures
of connectionist models of learning and memory" CMU
Tech report: PNP.CNS.94.1, March 1994.
-------------------------------------------------------------------------
More information about the Connectionists
mailing list