Parsing embedded sentences ...

Fri Dec 20 18:13:03 EST 1991

  The following paper is now available. To obtain a copy send a
 message to "d4pbjss0 at e0ub011.bitnet".

        ESRP: A DISTRIBUTED CONNECTIONIST PARSER THAT USES
            EMBEDDED SEQUENCES TO REPRESENT STRUCTURE

                      Josep M Sopena

              Departament de Psicologia Basica

                  Universitat de Barcelona

In this paper we present a neural network  that is able to
compute a certain type of structure, that among other things
allows it to adequately assign thematic roles, and find the
antecedents of the  traces, pro, PRO, anaphoras, pronouns, etc.
for an extensive variety of syntactic structures.

Up until now, the type of sentences that the network has been
able to parse include:

1. 'That' sentences with several levels of embedding.

     John says that Mary thought that Peter was ill.

2.- Passive sentences.

3.- Relative sentences with several levels of embedding (center
   embedded).

    John loved the girl that the carpenter who the builder hated
    was seeing.

    The man that bought the car that Peter wanted was crazy.

    The man the woman the boy hates loves is running.

4.-Syntactic ambiguity in the attachment of PP's

    John saw a woman with a handbag with binoculars.

5.- Combinations of these four types of sentences:

    John bought the car that Peter thought the woman with a
    handbag wanted.

The input consists of the sentence presented word by word. The
patterns in the output represent the structure of the sentence.
The structure is not represented by a static pattern but by a
temporal course of patterns. This evolution of the output is
based on different types of psychological evidence, and is as
follows: the output is a sequence of simple semantic predicates
(although it could be thought of in a more syntactical way). An
element of the output sequence consists only of a single
predicate, which always has to be complete. Since there are often
omitted elements within the clauses (eg. Traces, PRO, pro etc.)
the network retrieves these elements in order to complete the
current predicate.

These two mechanisms, segmentation into simple predicates and
retreival of previously processed elements, are those which allow
structure to be computed. In this way the structure is not
conceived solely as a linear sequence of simple predicates
because using these mechanisms it is posible to form embedded
sequences (embedded structures).

The paper also includes empirical evidence that supports the
model as a plausible psychological model.

The NN is formed by two parallel modules that share all of the
output and part of the input. The first module is an standard
Elman network that maps the elements in the input with their
predicate representation in the output and assigns the
corresponding semantic roles. The second module is a modified
Elman network with two hidden layers. The units of the first
hidden layer (which is the copied layer) have a linear function
activation. This type of network has a much greater short term
memory capacity than a standard Elman network. It stores the
sequence of predicates, retreives the elements of the current
predicate omitted in the input (traces, PRO etc.) and the
referents of pronouns and anaphoras. When a pronoun or an
anaphora appears in the input, the corresponding antecedent in
the sentence, which has been retreived from this second module,
is placed in the output. This module also allows the network to
build embedded sequences by retreiving former elements of the
sequence.

The two modules were simultaneously trained. There were no
manipulations other than the changes of inputs and targets, as in
the standard backpropagation algorithm.

The network was trained with 3000 sentences built from a starting
a vocabulary of 1000 words. The number of sentences that is
possible to build starting from this vocabulary is power(10,15).
The generalization was completely successful for a test set of
800 sentences representing the variety of syntactic patterns of
the training set.

The model bears some relationship with the idea of representing
structure not only in space but in time as well (Hinton 1989)and
with the RAAM networks of Pollack(1989). The shortcomings of this
type of networks are also discussed.