About sequential learning (or interference)

Wed Dec 14 00:03:50 EST 1994

Yet another approach to incremental learning can be seen in the Cascade
Correlation algorithm.  This creates hidden units (or feature detectors, if
you prefer) one by one, and freezes each one after it is created.  The
weights between these features and the outputs remain plastic and continue
to be trained.

This means that if you train a net on training set A and then switch to a
different training set B, you may build some new hidden units for task B,
but the hidden units created for task A are not cannibalized.  The A units
may be used in task B, either directly or as inputs to new hidden units,
but they remain unchanged and available.  As task B is trained, the output
weights change, so performance on task A will generally decline, but it can
come back very quickly if task A is re-trained or if you now train with a
combined A+B task.  The point is that the time-consuming part of learning
is in creating the hidden units, and these are retained once they are
learned.

My Recurrent Cascade-Correlation paper has an example of this.  This
recurrent net can be trained to recognize a temporal sequence of 1's and
0's as Morse code.  If you train all 26 letter-codes as a single training
set, the network will learn the task, but learning is faster and the
resulting net is smaller if you break the training set up into "lessons" of
increasing difficulty: first train on the shortest codes, then the
medium-length ones, then the longest ones, and finally on the whole set
together.  This is reminiscent of the modular networks explored by Waibel
and his colleagues for large speech tasks, but in this case you just have
to chop the training set up into modules -- the network architecture takes
care of itself.

References (these are also available online on Neuroprose, among other
places):

Scott E. Fahlman and Christian Lebiere (1990) "The Cascade-Correlation
Learning Architecture", in {\it Advances in Neural Information Processing
Systems 2}, D. S. Touretzky (ed.), Morgan Kaufmann Publishers, Los Altos
CA, pp. 524-532.

Scott E. Fahlman (1991) "The Recurrent Cascade-Correlation Architecture" in
{\it Advances in Neural Information Processing Systems 3}, R. P. Lippmann,
J. E. Moody, and D. S. Touretzky (eds.), Morgan Kaufmann Pulishers, Los
Altos CA, pp. 190-196.

-- Scott

===========================================================================
Scott E. Fahlman			Internet:  sef+ at cs.cmu.edu
Principal Research Scientist		Phone:     412 268-2575
School of Computer Science              Fax:       412 268-5576 (new!)
Carnegie Mellon University		Latitude:  40:26:46 N
5000 Forbes Avenue			Longitude: 79:56:55 W
Pittsburgh, PA 15213			Mood:      :-)
===========================================================================