past tense debate
    Virginia Marchman 
    marchman at amos.ling.ucsd.edu
       
    Mon Sep  5 18:04:26 EDT 1988
    
    
  
Jumping in on the recent discussion about connectionism and
the learning of the English past tense,  I would like to
make the following 2 points:
(1)   The data on acquisition of the past tense in real
      children may be very different from the patterns
      assumed by either side in this debate.
(2)   Networks can simulate "default" strategies that mimic
      the categorial rules defended by P&P, but the
      emergence of such rule-like behavior can depend
      on statistical properties of the input language
      (a constant input, not the discontinuous input
      used by R&M).  This finding may be relevant
      to discussions for both "sides" in light of the
      behavioral (human) data I allude to in (1).
(1)   As a psychologist interested in the empirical facts
which characterize the acquisition of the past tense (and
other domains of linguistic knowledge), I agree with
McClelland's comment directed to Pinker and Prince that
	> There's quite a bit more empirical research to be
	> done [to] even characterize accurately the facts about
	> the past tense.  I believe this research will
	> show you that you have substantially 
	> overstated the empirical situation in several respects.
         (Re: reply to S. Harnad, Connectionist Net, 8/31/88)
After OLC was released in tech report form (Occasional
Paper #33, 1987), I wrote a paper arguing that P&P may have 
underestimated the complexity and degree of individual variation 
inherent in the process of acquiring the English past tense
("Rules and Regularities in the acquisition of the English
past tense." Center for Research in Language Newsletter, 
UCSD, vol. 2, #4, April, 1988).  However, it is difficult for
me to believe that developmental data are (in fact, or in principle)
"too impoverished" to substantively contribute to the debate
between the symbolic and connectionist accounts (S. Harnad,
"On Theft vs. Honest Toil", Connectionist Net, 8/31/88). 
In the paper, I presented data on the production of past
tense forms by English-speaking children between the
ages of 3 and 8, using an elicitation technique essentially
identical to the one used by Bybee & Slobin (i.e., the data
cited in the original R&M paper).  While I was fully expecting
to see the standard "stages" of overgeneralization and "U-shaped"
development, the data suggested that I should stop and
re-think the standard characterization of the acquisition of
inflectional morphology.  First, my data indicated that
a child can be in the "stage" of overgeneralizing the "add -ed"
rule anywhere between 3 and 7 years of age.
Second, errors took several forms beyond the one emphasized
by P&P, i.e. overgeneralization of the "-ed" rule to irregular forms.
Instead, errors seem to result from the misapplication of
*several* (at least two) past tense formation processes.
For example, identity mapping (e.g. "hit --> hit")
was incorrectly applied to forms from several different
classes (both regulars and irregulars that require a vowel change).
Vowel changes were inappropriately applied
to regulars and irregulars alike (including examples
like "pick --> puck").  Furthermore, children committed
these "irregularizations" of regular forms at the same time
(i.e., within the same child) that they also committed
the better-known error of regularizing irregular forms.
Although individual children had "favorite" error types,
the different errors patterns were not concentrated in any
particular age range.  These data provide two challenges to the
stage model so often assumed by investigators on either
side of the symbolic/connectionist debate:
	(a) Why is it that children with very *different* amounts
	    of linguistic experience (e.g., 4 year olds and 7
	    year olds) over- and undergeneralize verbs in
	    qualitatively similar ways?  This degree of
	    individual variation within and across age levels
	    in "rate" of acquisition among normal children may
	    be outside acceptable levels of tolerance for a
	    stage model.  At the very least, additional evidence
	    is needed to conclusively assume that acquisition
	    proceeds in a "U-shaped" fashion from rote to rule
	    to rule+rote mechanisms. 
	(b) In several interesting ways, children can be shown
	    to treat irregular and regular verbs similarly
	    during acquisition.  Exactly what evidence does
	    one need to show that the regular transformation
	    (add -ed) has a privileged status *during acquisition*?
	    Although overextension of the -ed rule is the most
	    frequent error type overall, there was little in my
	    data upon which to claim that regulars and irregulars
	    are *qualitatively* different at any point in
	    the learning process.
As I state in the conclusion:
".... addressing at least some of the interesting questions
for language acquistion requires looking beyond what children
are supposed to be doing within any one "stage" of development.
I emphasized the idiosyncratic and multi-faceted nature of children's
rule-governed systems and asked whether the three-phased
model is the most useful metaphor for understanding how children
deal with the complexities inherent in the *systems* of language
at various points in development.  Rather than looking for ways
to explain qualitative changes in rule types and their domain
of operation, it may be more used to shift theoretical emphasis
onto acquisition as a protracted resolution of several competing
and interdependent sub-systems."
(2) In a Technical Report that will be available in the next
4-6 weeks ("Pattern Association in a Back Propagation
Network:  Implications for Child Language Acquisition",
Center for Research in Language, UCSD), Kim Plunkett
(psykimp%dkarh02.bitnet) and I will report on a series
of approx. 20 simulations conducted during the last 8 months
at UCSD.  Our goal was to extend the original R&M work
with particular focus on the developmental aspects of the model by
exploring the interaction of input assumptions with the specific
learning properties of the patterns that the simulation is required
to associate from input to output.  Our first explorations
in this problem confirmed the claim by P&P (OLC), that
the U-shaped developmental performance of the R&M simulation
was indeed highly sensitive to the discontinuity in vocabulary size
and structure imposed upon the model.  In our simulations,
we did NOT introduce any "artificial" discontinuities in
the input to the network across the learning period.
We restricted ourselves to mappings between phonological
strings -- although we agree with both P&P and McClelland
that children use more sources of information (e.g. semantics)
in the acquisition of an inflectional system like the past
tense.  It is certainly not our goal to suggest that 
linguistic categories (i.e. phonology, semantics) play no role
in the acquisition of language, nor that a connectionist network
that is required to perform phonological-to-phonological mappings
is faced with the same task as a child learning language.
But the results from these simulations may present
useful information about the
effects of different input characteristics on the kinds of
errors a net will produce -- including some understanding
of the conditions under which "rule-like" behaviors
will and will not emerge.  And, these error patterns (and the
individual variability obtained -- where different
simulations stand for different individuals) can
shed some light on the "real" phenomena that is of the
most concern.  In our mixture of approaches, we are trying
to systematically explore the assumptions of both the
symbolic and connectionist approaches to acquisition, keeping
what kids "really" do firmly in mind.
For our simulations, we constructed a language
that consists of legal English CVC, VCC, and CCV strings.
Each present and past tense form was represented using
a fixed-length distributed phonological feature system.
The task for each network was to learn (using back-propagation)
approximately 500 phonological-to-phonological mappings
where the present tense forms are transformed to the past
tense via one of four types of "rules": Arbitrary (any phoneme can
go to any other phoneme, like GO --> WENT),
Vowel Change (12 possible English vowel changes, analogous
to COME --> CAME),  Identity map (no change, analogous to
HIT --> HIT), and the turning on of a suffix (one of three
depending on the voicing of the final phoneme in the stem,
analogous to WALK --> WALKED).  Input strings were
randomly assigned to verb classes and therefore, *no information
was provided which tells the network to which class
a particular verb belongs*.
One primary goal of this work was to outline the particular
configuration of vocabulary input (i.e. "diet") that allowed
the system to achieve "adult-like competence" in the past
tense, with "child-like" stages in between.  Across simulations,
we systematically varied the overall number of unique
forms that undergo each transformation (i.e., class size),
as well as the number of times each class member is presented to the
system per epoch (token frequency). We experimented with several
different class size and token ratios that, according to estimates
out there in the literature, represent the vocabulary configuration
of the past tense system in English (e.g., arbitraries are
relatively few in number but are highly frequent).  We used
two measures of performance/acquisition after every sweep
through the vocabulary: 1) rate of learning (overall error rate),
and 2) success at achieving the target output forms
(overall "hit" rate, consonant "hits", vowel "hits" and
suffix "hits").  With these, we determined the degree to which the
network was achieving the target, as well as the tendency for
the network to, for example, turn on a suffix when
it shouldn't, change a vowel when it should identity map, etc.
*at every point along the learning curve*.
I will not describe all of the results here, however, one
finding is particularily relevant to the current discussion.
In several of our simulations, the network tended to
adopt a "default" suffixation strategy when it formed
the past tense of verbs.  That is, even though the system
was getting a high proportion of the both the "regular" and
the "irregular" (arbitrary, vowel change and identity)
verbs correct, the most common errors made by the system
at various points in development are best described as
overgeneralizations of the "add -ed" rule.  However, other
error types (analogous to the "irregularizations" described
above) also occurred.  Certain configurations of class
size (# of forms) and token frequency (# of exemplars repeated)
resulted in a network that adopted suffixation as its "default"
strategy; yet, in other simulations (i.e., vocabulary
configurations), the network adopted "identity mapping" as its guide
through the acquisition of the vocabulary.  Overgeneralizations
of the identity mapping procedure were prevalent in several
simulations, as was the tendency to incorrectly change a vowel.
It is important to stress that these different outcomes
occurred in the *same* network (e.g., 3 layer, 20 input
units, etc.), each one exposed to a different combination of
regular and irregular input.  Emergence of a default
strategy (a rule?) at certain points in learning depended
not on tagging of the input (as P&P suggest), but on the ratio
of regulars and irregulars in the input to which the system was
exposed.  This pattern of performance could *not* have
been determined by the phonological characteristics of
members of either the regular or the irregular classes.
That is, phonological information was available to the
system (within the distributed feature representation)
but the phonological structure of the stem did not
determine class membership (i.e., performance
was not determined by the identifiability
of which "class" of relationships would obtain between
the input and the output). 
The input-sensitivity of error patterns in our simulations
may come as bad news to those who (1) care about what
children do, and (2) believe that children go through
a universal U-shaped pattern of development.
However, as I suggest in my CRL paper, this familiar
characterization of "real" children may not be the most
useful for understanding the acquisition process.
Default mappings, rule-like in nature, can emerge
in a system that is given no explicit information about
class membership (bad news for P&P?), but such an outcome
is by no means guaranteed.  Our current and future work includes
a comparison of this set of simulations with additional sets
in which information about class membership is explicitly
"tagged" in the system (as P&P assume), models in which
phonological similarity in the stem is varied systematically
(to determine whether default mappings still emerge), and
models in which semantic information is also available (as
everyone on earth assumes must be the case for
a realistic model of language learning).
Virginia Marchman
Department of Psychology C-009
UCSD
La Jolla, CA  92093
marchman at amos.ucsd.ling.edu
    
    
More information about the Connectionists
mailing list