What have neural networks achieved?
Bryan B. Thompson
bryan at cog-tech.com
Wed Aug 26 09:40:08 EDT 1998
Max,
Think about the structure of this argument for a moment. It runs
thus:
1. Neural networks suffer from catastrophic interference.
2. Therefore the cortical memory system suffers from catastrophic
interference. 3. That's why we might need a hippocampus.
Is everyone happy with the idea that (1) implies (2)?
Max
max at currawong.bhs.mq.edu.au
I am not happy with the conclusion (1), above. Catastrophic
interference is a function of the global quality of the weights
involved in the network. More local networks are, of necessity, less
prone to such interference as less overlapping subsets of the weights
are used to maps the transformation from input to output space. Modifying
some weights may have *no* effect on some other predictions. In
the extreme case of table lookups, it is clear that catastropic
interference completely disappears (along with most options for
generalization, etc.:) In many ways, it seems that this statement is
true for supervised learning networks in which weights are more global
than not. Other, more practical counter examples would include
(differentiable) CMACs and radial basis function networks.
A related property is the degree to which a learned structure ossifies
in a network, such that the network is, more or less, unable to
respond to a changing environment. This is related to being stuck in
local minima, but the solution may even have been optimal for the
initial environmental conditions. Networks, or systems of networks,
which permit multiple explanatory theories to be explored at once are
less susceptible to both of these pitfalls (catastrophic interference
and ossification, or loss of plasticity).
The support of "multiple explanatory theories" opens the door to area
which does not appear to receive enough attention: neural architectures
which perform many-to-many mappings vs learn to estimate a weighted
average of their target observations. For example, you are drawing
samples from a jar of colored marbles, or prediction of the part-of-
speech to follow in a sentence. Making the wrong prediction is not
an error, it should just lead to updating the probability distribution
over the possible outcomes. Averaging the representations and predicting,
e.g., green(.8) + red(.2) => greed(1.0), is the error.
So, are there good reasons for believing that "cortical memory system"s
(a) exhibit these pitfalls (catastrophic interference,
ossification or loss of plasticity, and averaging target
observations).
or
(b) utilize architectures which minimize such effects.
Clearly, the answer will be a mixture, but I believe that these effects
are all minimized in our adaptive behaviors.
-- bryan thompson bryan at cog-tech.com
More information about the Connectionists
mailing list