distributed reps

Fri Jun 7 22:03:50 EDT 1991

Properties like damage resistance, graceful degradation, etc, are all nice,
useful, cognitively plausible possibilities, but I would have thought that
by far the most important property of distributed representation is the
potential for systematic processing.

Obviously ultra-local systems (every possible concept represented by an
arbitrary symbol) don't allow much systematic processing, as each symbol
has to be handled by its own special rule: e.g. <if "CAT" do this>, <if
"DOG" do that> (though things can be improved somewhat by connecting the
symbols up, as e.g. in a semantic network).  Things are much improved by
using compositional representations, as e.g. found in standard AI.  If
you represent many concepts by compounding the basic tokens, then certain
semantic properties can be reflected in internal structure -- e.g.
"LOVES(CAT, DOG)" and "LOVES(JOHN,BILL)" have relevantly similar internal
structures -- opening the door to processing these structures in systematic
ways.

Distributed representations just take this idea a step further.  One
sees the systematicity made possible by giving representations internal
structure as above, and says "why stop there?"  e.g. why not give every
representation internal structure (why should CATs and DOGs miss out?).
Compositional representations as above only represent a limited range of
semantic properties systematically in internal structure -- namely,
compositional properties.  All kinds of other semantic properties might be
fair game.  By moving to e.g. vectorial representation for every concept,
then e.g. the similarity structure of the semantic space can be reflected
in the similarity structure of the representational space, and so on.  And
it turns out that you can process compositional properties systematically
too (though not quite as easily).  The combination of a multi-dimensional
space with a large supply of possible non-linear operations seems to open
up a lot of possible kinds of systematic processing, essentially because
these operations can chop up the space in ways that standard operations on
compositional structures can't.

The proof is in the pudding, i.e. the kinds of systematic processing that
connectionist networks exhibit all the time.  Most obviously, automatic
generalization: new inputs are slotted into some representational form,
hopefully leading to reasonable behaviour from the network.  Similarly for
dealing with old inputs in new contexts.  By comparison, with ultra-local
representations, generalization is right out (except by assimilating new
inputs into an old category, e.g. by nearest neighbour methods).  Using
compositional representations, certain kinds of generalization are obviously
possible, as with decision trees.  These suffer a bit from having to deal
directly with the original input space, rather than developing a new
representational space as with dist reps: so you (a) don't get the very
useful capacity to take a representation that's developed and use it for
other purposes (e.g. as context for a recurrent network, or as input for
some new network), and (b) are likely to have problems on very large input
spaces (anyone using decision trees for vision?).  Both (a) and (b) suggest
that decision trees may be unlikely candidates for the backbone of a
cognitive architecture (conversely, the ability of connectionist networks
to transform one representational space into another is likely to be key
to their success as a cognitive architecture).  As for generalization
performance, that's an empirical matter, but the results of Dietterich etc
seem to indicate that decision trees don't do quite as well, presumably
because of the limited ways in which they can chop up a representational
space (nasty rectangular blocks vs groovy flexible curves).  There's far
too much else that could be said, so I'll stop here.

Dave Chalmers.