Dynamic binding

Mark St. John stjohn at cogsci.UCSD.EDU
Fri Nov 5 15:17:27 EST 1993


The sort of "holistic" binding that happens in the hidden layer of a
3-layer network and that Graham Smith describes is not particularly
new, at least if you look in the right place.  As Smith says, this is
just the sort of binding that happens in one of Jordan Pollack's RAAM
networks (1990, Artificial Intelligence), and it's much the same as
the sort of binding that happens in the hidden layer in the language
comprehension systems that Jay McClelland and I (1990, Artificial
Intelligence) have developed and that Risto Miikkulainen and Michael
Dyer (1991, Cognitive Science) have developed.  An example of binding
in these models would be to give them a sentence like, "Ricky played a
trick on Lucy," and observe that the model correctly binds Ricky to
the agent role and Lucy to the recipient role.  Then you give the
opposite sentence (Lucy played a trick on Ricky) and observe that the
bindings are reversed.

One serious issue/limitation of this holistic binding method, however,
is how well it generalizes to novel cases: How many of the total
possible sentences have to be trained so that the remaining sentences
will be processed correctly?  What happens is that the sentences
missing from the training set create regularities that the model can
learn.  These regularities are sentences that do not (and so as far as
the hidden units are concerned, cannot) occur.  The question, then,
once the network has been trained, which force is stronger: the
generalization to novel cases, or the regularity the network learned
that that novel case cannot happen?  For example, say the network
never trained on "Lucy played a trick on Ricky."  The model learns
many other sentences that suggest how to map sentence constituents to
thematic roles, but it also learns that playing tricks is a one-way
sort of deal because Lucy never seems to play them.  Now if we give
the Lucy sentence as a generalization test case, part of the model wants
to generalize and activate the systematic meaning based on all that it
learned about sentence comprehension, but another part of the model
wants to correct this obvious error in the input because it knows that
this sentence and meaning are unlikely.  The model can "correct the
error" by flipping the agent and recipient to the better known
arrangement or by changing the verb to a more likely alternative, etc.

Which part of the model (or better put, which influence) wins depends
on many factors, such as the number of hidden units, the cost
function, the combinatorial nature of the training corpus, the size of
the training corpus, and so on.  It turns out that to achieve the sort
of systematic mapping we want we need to use A LOT of hiddens units
(yes, more is better in this case of generalization), a large training
set, and critically, a reasonably combinatorial corpus so that each
element/word is paired with some variety of other elements (in
statistical terms, you need to break as many high-order regularities
as possible).  See St. John (1993, Cognitive Science Proceedings) for
some discussion.  I'm a little embarrassed to toot my own horn, but
I've thought about this some, and these papers may be of some interest
-- in addition to Janet Wiles paper (1993, Cognitive Science
Proceedings) that John Colen mentioned.

One final point I'd like to raise is that this tension between
generalization (along the lines of a systematic mapping) and "error
correction" is not all bad.  There is considerable psycholinguistic
evidence that these sorts of "error corrections" happen with some
frequency.  People mis-hear, mis-read, mis-remember, see what they
want to see, etc. all the time.  On the other hand, we can all
understand the infamous sentence "John ate the guitar" even though
we've presumably never seen such a thing before and it's pretty
unlikely.  This ability, however, may simply attest to the wide
variety and combinatorics of our language training.  Why it is that we
mis-read on some occassions and comprehend the systematic meaning,
like with John eatting the guitar, on other occassion is not well
understood.  Training is probably involved, and attention is probably
involved, to name two factors.  We're currently work on models and
human experiments to understand this issue better.

-Mark St. John
Dept. of Cognitive Science, UCSD


More information about the Connectionists mailing list