About sequential learning (or interference)

Fri Dec 16 12:06:27 EST 1994

>of course avoiding "interference" is
>another way of preventing generalization.

>As usual there is a tradeoff here.

Yes there is a trade-off but not with interference as Hetherington has pointed
out.

In fact, with backprop, the way to entirely eliminate interference is
to get a good approximation to the total underlying function that is
being sampled. For example, with an autoencoder memory, if the there
is good extraction of the identity function then there will be no
interference from training on sucessive memory sets (and of course
little need for further training).

The trade-off is between old-new discrimination and generalisation. Definitionally,
as one improves the other collapses.

In the paper cited by Heterington (which is now under journal
submission). We present a formally guaranteed solution to the
interference and discrimination problem (the HARM) model, but it
demands exponentially increasing computational resources. It is really
used to show the problems of other localisation solutions (French,
Murre, Kruschke etc.).

We also report some interesting empirical (simulation) results of this trade-off in a
much shorter paper:

Sharkey, NE, and Sharkey, AJC, (in press) Interference and
Discrimination in Neural Net Memory. In Joe Levy, Dimitrios
Bairaktaris, John Bullinaria and Paul Cairns.(Ed) Connectionist Models
of Memory and Language, UCL press.

If anyone is interested I will mail a postscript copy of the tech report to them.

Sharkey, N.E., & Sharkey, A.J.C. Understanding Catastrophic
  Interference in neural Nets.  Technical Report, Department of Computer
  Science, University of Sheffield, U.K. 

Abstract

A number of recent simulation studies have shown that when feedforward
neural nets are trained, using backpropagation, to memorize sets of
items in sequential blocks and without negative exemplars, severe
retroactive interference or {\em catastrophic forgetting} results.
Both formal analysis and simulation studies are employed here to show
why and under what circumstances such retroactive interference arises.
The conclusion is that, on the one hand, approximations to "ideal"
network geometries can entirely alleviate interference, but at the
cost of a breakdown in discrimination between input patterns that have
been learned and those that have not: {\em catastrophic remembering}.
On the other hand, localized geometries for subfunctions eliminate the
discrimination problem but are easily disrupted by new training sets
and thus cause {\em catastrophic interference}.  The paper concludes
with a Hebbian Autoassociative Recognition Memory (HARM) model which
provides a formally guaranteed solution to the problems of
interference and discrimination. This is then used as a yardstick with
which to evaluate other proposed solutions.

noel

 Noel Sharkey  	   	   
 Professor of Computer Science   
 Department of Computer Science  
 Regent Court                    
 University of Sheffield 	 
 S1 4DP, Sheffield, UK           

 N.Sharkey at dcs.shef.ac.uk

FAX: (0742) 780972