distribution and its advantages
Timothy van Gelder
tgelder at phil.indiana.edu
Mon Jun 17 11:45:58 EDT 1991
Javier Movellan's question -- what are distributed
representations are good for anyway? -- is I think an
important one for connectionism and cognitive science
generally. Trouble is, the way it was put, it presupposes that
there is some one kind of representation that everyone is
referring to when they talk about distribution. In fact, though
most people have a reasonable idea what they themselves
intend when they use the term "distributed", they usually
don't realize that its not the way many other people use it.
This is immediately apparent if one takes an overview of the
responses that actually came in. Various people took it that a
representation is distributed if it utilizes many units rather
than just one, with the "strength" of distribution increasing
as the total number of units (or perhaps, the proportion of
available units) used increases. Massone by contrast thought
the key concept is that of redundancy, which I take roughly to
mean that a given piece of input information is represented
multiple times. This presumably requires that many units are
used (i.e., that there is distribution in the previous sense) but
is a significantly stronger requirement. Massone's position
was echoed in some other responses. Chalmers claims that a
distributed representation is one in which every
representation, whether of a basic concept or a more complex
one, has a kind of semantically significant internal structure.
This definition also seems to presuppose the first kind of
definition, but is different from redundancy. Proposing a
somewhat different definition again, French suggested that
distribution is a matter of the degree of "overlap" between
representations of different entities. And so on.
This lack of agreement over what distribution actually is at
least partly responsible for the fact that no really clear and
useful consensus on the advantages of distributed
representation really emerged in the responses to the initial
question. It manifests a wider lack of agreement over the
concept of distribution in connectionism and cognitive
science more generally. I once surveyed as many of the
definitions and occurrences of "distribution", "distributed
representation", etc., as I could find in the cognitive science
literature, and found that there were at least 5 very different
basic properties that people often refer to as distribution.
These ranged from a very simple notion of "spread-out-
ness" - each entity being represented by activity in many
units rather than just one - at one extreme, to complete
functional equipotentiality at the other. (A representation is
functionally equipotential when any part of it can stand in for
the whole thing. Holograms are famous for exhibiting a form
of equipotentiality.) Authors often picked up multiple strands
and ran them together in one characterization, or defined
distribution differently on different occasions, sometimes
even in the same work.
Probably the two most common definitions are (1) the notion
of simple extendedness just mentioned (i.e., using "many"
units to represent a given item) and (2) superimposition of
representations. We have superimposition when there are
multiple items being represented at the same time, but no
way of pointing to the discrete part of the representation
which is responsible for item A, the discrete part which is
responsible for item B, and so forth. Think of the weights in
a standard feed-forward network. Here multiple input-output
associations are represented at the same time, but there is (in
general) no separate set of weights for each association.
To see how these two senses simultaneously dominate
connectionist discussions of distribution, think again of the
answers to Movellan's question. Many of the answers took
the form, roughly, that "when I used representations
involving activity in many units rather than just one in such
and such a network, I found better (or worse!) performance".
Other responses, particularly those that made reference to
the brain or neuropsychological results, were more concerned
with the extent to which there is separate or discrete storage
of the various components of our knowledge in a given
circumscribed domain. (In these contexts, "graceful
degradation" in performance is often thought to be a
consequence of knowledge being stored in an inextricably
superimposed fashion.)
In one sense, it is not surprising that these are the two most
common notions of distribution. Perhaps the only thing that
is really clear about distribution is the opposition between
distribution and localization: whatever distributed
representations are, they are non-local. Trouble is, "local"
turns out to be ambiguous. Sometimes "local" means
restricted in extent (e.g., using only one unit rather than
many), and sometimes it means not overlapping with the
representation of anything else. The two most common
senses of "distribution" mentioned a moment ago simply
result from denying locality in these two distinct senses.
It seems to me that a necessary condition for any significant
progress on the question "what are distributed
representations good for?" is that this general state of
confusion over what "distributed" means be resolved. This
means clearly laying out the different senses that are floating
around, picking out the one that is the most central and most
theoretically significant, and giving it a reasonably precise
definition. I attempted this in Ch.1 of my PhD dissertation
(Distributed Representation, University of Pittsburgh 1989);
a shorter overview of some of the material from that chapter
has recently appeared as "What is the D in PDP? An overview
of the concept of distribution" in Stich, Ramsey & Rumelhart
(eds) Philosophy and Connectionist Theory.
In my opinion, the most important concept in the vicinity of
distribution is that of superimposition of representations,
and it is for this that the term "distributed" should really be
reserved. One advantage of this strategy is that
superimposition admits of a surprisingly clear and satisfying
mathematical definition:
Suppose R is a representation of multiple items. If the
representings of the different items are fully superimposed,
every part of the representation R must be implicated in
representing each item. If this is achieved in a non-trivial
way there must be some encoding process that generates R
given the various items to be stored, and which makes R
vary, at every point, as a function of each item. This process
will be implementing a certain kind of transformation from
items to representations. This suggests thinking of
distribution more generally in terms of mathematical
transformations exhibiting a certain abstract structure of
dependency of the output on the input. More precisely, define
any transformation from a function F to another function G
as strongly distributing just in case the value of G at any
point varies with the value of F at every point; the Fourier
transform is a classic example. Similarly, a transformation
from F to G is weakly distributing, relative to a division of
the domain of F into a number of sub-domains, just in case
the value of G at every point varies as a function of the value
of F at at least one point in each sub-domain. The classic
example here is the linear associator, in which a series of
vector pairs are stored in a weight matrix by first forming,
and then adding together, their respective outer products.
Each element of the matrix varies with every stored vector,
but only with one element of each of those vectors. (The
"functions" F and G in this case describe the input vectors
and the association matrix respectively; e.g., given an
argument specifying a place in an input vector, F returns the
value of the vector at that place.)
Clearly, a given distributing transformation yields a
whole space of functions resulting from applying that
transformation to different inputs (i.e., different functions
F). If we think of these output functions as descriptions of
representations, and the input functions as descriptions of
items to be represented, the distributing transformation is
defining a whole space or scheme of distributed
representations. To be a distributed representation, then, is
to be a member of such a scheme; it is to be a representation
R of a series of items C such that the encoding process which
generates R on the basis of C implements a given distributing
transformation.
Basically, then, distributed representations are what you get
from distributing transformations, which are
transformations which make each part of the output (the
representation) depend on every part of the input (what
you're representing). Now, mathematically speaking, there is
a vast number of different kinds of distributing
transformations, and so there is a vast number of possible
instantiations of distributed representation. Connectionists
can be seen as exploring that portion of the space of possible
transformations that you can handle with n-dimensional
vector operations, learning algorithms, etc. In other domains
such as optics it is possible to implement other forms of
distributing transformations and hence to get distributed
representations with different properties.
There are a number of reasons for wanting to define
distributed representation in terms of superimposition
generally, and distributed transformations in particular:
(a) superimposition is certainly one of the most common of
the standard senses of "distribution" in current usage, and so
we remain as close as possible to that usage;
(b) superimposition admits of a precise mathematical
definition, so those who think clarity only comes from
formalization should be kept happy;
(c) various popular properties of distributed representation
such as automatic generalization and graceful degradation are
a natural consequence of distribution defined this way;
(d) in practice, in a connectionist context, distribution in the
sense of requiring many units rather than just one is a
necessary precondition of this more full-blooded notion;
hence any advantages that accrue to representations in virtue
of utilizing many units also accrue to superimposed
representations;
(e) a number of other interesting theoretical results follow
from defining distribution this way: in particular, it can be
shown that distributed representations cannot be symbolic in
nature, on a reasonably precise definition of "symbolic" (see
e.g. my "Why distributed representation is inherently non-
symbolic", in G. Dorffner (ed.) Konnektionismus in Artificial
Intelligence und Kognitionsforschung. Berlin: Springer-
Verlag, 1990; 58-66).
On the basis of this kind of definition of what distributed
representation is, what kind of answer can be given to the
"what are distributed representations good for?" question?
Well, the kind of answer you will find satisfying will depend
very much on what your theoretical interests are. A
connectionist whose concerns have more of an applied,
engineering focus will want to know what specific processing
benefits arise from using representations generated by
distributing transformations. As mentioned in (c) above, I
think that some of the favorite virtues of distribution are
best seen as an immediate consequence of superimposition.
The technical issues here still need much clarification,
however.
As a cognitive scientist, on the other hand, I'm interested in
more general questions such as - what are the advantages of
distribution for human knowledge representation? Here I
don't have any actual answers ready to hand; the most I can
do the moment is point to the kind of question that seems the
most interesting. Speaking at the broadest possible level:
various difficulties encountered in mainstream AI, combined
with some philosophical reflections, suggest that everyday
commonsense knowledge cannot be fully and effectively
captured in any kind of purely symbolic format; that, in other
words, symbolic representation is fundamentally the wrong
medium for capturing at least certain kinds of human
knowledge. Just above I mentioned that distributed
representation (defined in terms of superimposition) can be
shown to be intrinsically non-symbolic. The obvious
suggestion then is: perhaps the most important advantage of
distributed representation is that it (and it alone?) is capable
of representing the kind of knowledge that underlies everyday
human competence?
Tim van Gelder
More information about the Connectionists
mailing list