Discussion I: Reading & Neural Nets

Sun Nov 24 01:07:37 EST 1991

Here is the first of two exchanges concerning the Target Article
on Reading and Connectionism that appeared in PSYCOLOQUY 2.8.4
(retrievable by anonymous ftp from directory pub/harnad on
princeton.edu). Further commentary is invited. All contributions
will be refereed. Please submit to psyc at pucc.bitnet or
psyc at pucc.princeton.edu -- NOT TO THIS LIST.

PSYCOLOQUY V2 #9 (2.9.3 Commentary / Coltheart, Skoyles: 345 lines)
PSYCOLOQUY   ISSN 1055-0143     Sun, 24 Nov 91       Volume 2 : Issue   9.3
      2.9.3.1 Commentary on Connectionism, Reading.... / Coltheart
      2.9.3.2 Reply to Coltheart / Skoyles

----------------------------------------------------------------------

From: max.coltheart at mrc-applied-psychology.cambridge.ac.uk
Subject: 2.9.3.1 Commentary on Connectionism, reading.... / Coltheart

        Connectionist modeling of human language
	    processing: The case of reading

    (Commentary on Skoyles Connectionism, Reading and
     the Limits of Cognition PSYCOLOQUY 2.8.4 1991)

                  Max Coltheart
         School of Behavioural Sciences
             Macquarie University
            Sydney, NSW, Australia
  max.coltheart at mrc-applied-psychology.cambridge.ac.uk

Skoyles (1991) wrote in his Rationale (paragraph 8):

  "Connectionism shows that nonword reading can be done purely by
	processes trained on real words without the use of special
	grapheme-phoneme translation processes."

This is not the case. The connectionist model in question, that of
Seidenberg & McClelland (1989), reads nonwords very poorly after being
trained on words. Besner, Twilley, McCann and Seergobin (1990) tested
its reading of various sets of nonwords. The trained model got 51%, 59%
and 65% correct; people get around 90%. The Seidenberg and McClelland
paper itself does not report what rate of correct reading of nonwords
the model can achieve.

Skoyles also writes (paragraph 3):

  "Connectionist (PDP) neural network simulations of reading
	successfully explain many experimental facts found about word
	recognition (Seidenberg & McClelland, 1989)"

I would like to see a list of facts about reading that the PDP model
can explain; even more, I would like to see a list of facts about
reading that the traditional non-PDP dual-route model (which uses rules
and local representations) cannot explain but which the PDP model can.

Here is a list of facts which are all discussed in the Seidenberg &
McClelland paper, which can be explained by the dual-route model, but
which cannot be explained by the PDP model:

1. People are very accurate at reading aloud pronounceable nonwords.
This is done by using grapheme-phoneme rules in a dual-route model. As
I've already mentioned, the PDP model is not accurate at reading
nonwords aloud, so cannot explain why people are.

2. People are very accurate at deciding whether or not a pronounceable
letter string is a real word (lexical decision task). The PDP model is
very inaccurate at this: In the paper by Besner et al (1990), it is
shown that the model achieves a correct detection rate of about 6%
(typical of people) at the expense of a false alarm rate of over 80%
(not typical of people). So the PDP model cannot explain why people are
so accurate at lexical decision.

3. After brain damage in some people reading is affected in the
following way: nonword reading is still normal, but many exception
words, even quite common ones, are wrongly read. In addition, the
erroneous responses are the ones that would be predicted from applying
spelling-sound rules (e.g. reading PINT as if it rhymed with "mint").
This is surface dyslexia; two of the clearest cases are patients MP
(Bub, Cancelliere and Kertesz, 1985) and KT (McCarthy and Warrington,
1986). According to the dual-route explanation the lexical route for
reading is damaged but the nonlexical (rule-based) route intact.
Attempts have been made to simulate this by damaging the trained PDP
model (e.g., by deleting hidden units). These attempts have not
succeeded. It seems highly unlikely that they ever will succeed: Since
the damaged patients are about 95% right at reading nonwords, and the
intact model gets only around 60% right, is it likely that any form of
"lesion" to the model will make it much BETTER at reading nonwords?

4. After brain damage in some people reading is affected in the
following way: Word reading is still good, but nonword reading is very
bad. This is phonological dyslexia. A clear case is that of Funnell
(1983); her patient could not read any nonwords at all, but achieved
scores of around 90% correct in tests of word reading. The dual-route
explanation would be that there was abolition of the nonlexical route
and sparing of the lexical route. Seidenberg & McClelland appeal to a
way (not implemented in their model) of reading from orthography
through meaning to phonology. This would of course fail for a
meaningless letter string, so anyone reading solely by such a route
would be able to read words but not nonwords. The explanation fails,
however, because in the case of phonological dyslexia referred to above
(Funnell, 1983), the patient also had a semantic impairment and would
have shown semantic confusions in reading aloud if he had been reading
semantically. He did not make such confusions. Therefore Seidenberg and
McClelland's reconciliation of phonological dyslexia with their model
cannot be correct.

Pinker and Prince (1988) argued that any model which eschews explicit rules
and local (word or morpheme) representations would fail to explain the data
on children's learning of past tenses. I argue that any model which eschews
explicit rules and local (word or morpheme) representations will fail to
explain the data on adult skilled reading.

NETtalk (Sejnowski and Rosenberg, 1986) might be offered as a
counterexample to my claim, but it will not serve this purpose. First,
Sejnowski and Rosenberg explicitly state that NETtalk is not meant to
be a model of any human cognitive process. Second, perhaps the major
computational problem in reading nonwords aloud - coping with the fact
that the mapping of letters to phonemes is often many-to-one, so that
the words AT, ATE, ACHE and EIGHT all have just two phonemes - is not
dealt with by NETtalk. The input upon which the network operates is
precoded by hand in such a way that there is always a one-to-one
mapping of orthographic symbol to phoneme; so NETtalk does not have to
try to solve this problem.

5. References

Besner, D., Twilley, L., McCann, R.S. and Seergobin, K. (1990) On the
association between connectionism and data: are a few words necessary? 
Psychological Review, 97, 432-446.

Bub, D., Cancelliere, A. and Kertesz, A. (1985) Whole-word and analytic
translation of spelling to sound in a non-semantic reader. In Patterson,
K., Marshall, J.C. and Coltheart, M. (eds) (1985) Surface Dyslexia:
Cognitive and Neuropsychological Studies of Phonological Reading. London:
Lawrence ERlbaum Associates Ltd.

Funnell, E.  (1983) Phonological processes in reading: New evidence from
acquired dyslexia. British Journal of Psychology, 74, 159-180.

McCarthy, R. and Warrington, E.K. (1986) Phonological reading: phenomena
and paradoxes. Cortex, 22, 359-380.

Pinker, S. and Prince, A. (1988) On language and connectionism: Analysis of
a parallel distributed model of language acquisition. Cognition, 28,
73-194.

Seidenberg, M. S. and McClelland, J. l. (1989). A distributed,
developmental model of word recognition and naming. Psychological Review, 
96, 523-568.

Sejnowski, T.J.  and Rosenberg, C.R.  (1986) NETtalk: A parallel network
that learns to read aloud (EE and CS Technical Report No. JHU/EECS-86/01).
Baltimore, Maryland: Johns Hopkins University.

Skoyles J. (1991) Connectionism, Reading and the Limits of
Cognition. PSYCOLOQUY 2.8.4.

----------------------------------------------------------------------

From: John R Skoyles <ucjtprs at ucl.ac.uk>
Subject: 2.9.3.2 Reply to Coltheart / Skoyles

        The Success of PDP and the Dual Route Model:
         Time to Rethink the Phonological Route
               (Reply to Coltheart)

                John R. Skoyles
            Department of Psychology
           University College London
               London WC1E 6BT
              ucjtprs at ucl.ac.uk

Max Coltheart makes a good defense of the dual route model (the view
that there are separate phonological and nonphonological ways of
recognising written words) but he appears to overlook the fact that I
am attempting to do the same thing: to defend the existence of more
than one route in reading. I am going about this in a completely
different manner, however, and Coltheart does not seem not to have
spotted either how I do this or the degree to which we agree.

My strategy is to take the main alternative to the dual route model --
PDP (a single "route" connectionist [network] account of phonological
and nonphonological word recognition) and show that even if PDP is as
good as its advocates claim, it is incomplete and needs a separate
reading mechanism to come into existence.

The problem is that the reading abilities of PDP models need to be
tutored using error correction feedback. Where this feedback comes
from, however, is left out of PDP accounts of reading. In my target
article (Skoyles 1991) I showed that error correction feedback can only
exist if there is some process independent of the PDP network which can
identify words correctly and so judge whether or not the network has
read them correctly. Without this, error correction feedback, and hence
the reading abilities of PDP networks, cannot occur. I further show
that research on child reading and dyslexia strongly suggests that this
independent recognition of written words depends in human readers upon
sounding out words and accessing oral knowledge of their pronunciation.

Coltheart essentially claims that I need not go so far: PDP simply
cannot model the most interesting aspects of reading and so the above
argument is premature. I cannot go along with his critique of PDP,
although in many ways I would like to (I cannot be alone in longing for
the good old days when the dual route model reigned supreme). It is not
as easy as Coltheart implies to dismiss the phonological reading
abilities shown by PDP networks. They do read a large number of
nonwords correctly -- though Coltheart is right to note that they are
not as good as skilled readers. Nonetheless, they do read some words
correctly which is surprising given that PDP networks lack any specific
knowledge of spelling-sound transcoding. These nonword reading skills
are important even if they are not as good as those of proficient
readers because we can no longer automatically assume that every time
people read a nonword they do so using an independent grapheme-phoneme
phonological route -- for they might instead be reading them (at least
some of the time) by something like a PDP network.

My disagreement with Coltheart concerns whether there are one or two
kinds of phonological reading -- I suggest at least two exist. The
first process is attentive (such as when you have to stop reading War and
Peace to work out the pronunciation of the names of the characters).
Attentive decoding depends, I suggest, upon rule-like grapheme-phoneme
decoding. The second process, nonattentive phonological decoding (when you
read monosyllables which happen not to be real words like VIZ),
depends, I suggest, upon PDP networks. In contrast to attentive
phonological decoding, nonattentive phonological decoding depends on
generating phonology using the statistical regularities between
spelling and pronunciation that are incidentally acquired by PDP networks
when they are learning to read real words. The processes responsible
for attentive and nonattentive phonological coding are independent of
each other. Both attentive and nonattentive phonological decoding can
produce phonological output that can be used to access the oral
knowledge of word pronunciation contained in the speech system to
identify words (perhaps along with with semantic and sentence context
information -- see note 1). The boundary between the two forms of
phonological decoding in any individual will depend upon their reading
experience and their innate phonological capacities -- a five-year-old
will probably only be able to read a monosyllabic nonword attentively,
whereas a linguist will have no difficulty nonattentively sounding out
obscure polysyllabic Russian names.

My difference with Coltheart lies in our respective ways of defining
the nonlexical reading route. Coltheart takes it to be a phonological
route which reads nonwords through the use of explicit spelling-sound
correspondence rules. I instead take it to be primarily a route using
phonological decoding processes that can identify words by using the
phonological information contained in word spelling to access a
reader's oral knowledge of how words sound. Although nonattentive
phonological processes can access oral knowledge, I suggest that this
is much less likely than the use of attentive processes. If we focus on
decoding a spelling to recognise the word behind its pronunciation we
are more likely to adopt attentive rather than nonattentive processes
as a consequence of stopping and focusing.

Thus although we both support the existence of two reading routes, we have
very different notions as to what they are. In this context, I will
answer Coltheart's points one by one. I paraphrase his criticisms
before describing my replies.

(1)  "PDP models are not very accurate at reading nonwords ... people are."

As noted, people use a mix of attentive and nonattentive phonological
decoding, whereas PDP networks only stimulate nonattentive ones. 

(2)  "People are very accurate at deciding whether or not a pronounceable
letter string is a real word (lexical decision) .. [PDP models are not]."

First, the nature of lexical decision is controversial, with some
arguing that it involves access to lexical representations and others
that it does not (Balota & Chumbley, 1984). In addition, in order for PDP
models to simulate lexical decisions, new assumptions are added to them.
PDP models are designed to give correct phonological output to a given
spelling input and not to make lexical decisions. To model lexical
decisions, its modelers have made the additional assumption that back
activation from its hidden units to its input units reflects some
measure of lexicality. This is an assumption added to the model.
hence it could be this assumption as much as the model which is at
fault.

(3)  "Some brain lesions leave people with good nonword reading abilities
with damaged lexical word recognition abilities -- surface dyslexia."

Fine, such people are relying upon attentive phonological "sounding out"
processes; their nonattentive processes are damaged along with their lexical
reading processes.

(4)  "Some brain lesions leave people with good lexical reading abilities
with damaged phonological ones -- phonological dyslexia."

Unfortunately, acquired phonological dyslexia is rather rare (Funnell's
patient, whom Coltheart cites, is nearly a unique case). It is so rare
that afflicted individuals might have had phonological reading problems
prior to their brain damage (Van Orden, Pennington and Stone, 1990).

The difference between Coltheart and myself is that whereas he collapses
nonattentive and attentive phonological reading together, I separate them.
Can our two positions be tested? I think they can. If I am right,
skilled readers should read nonwords with two levels of performance:
First, they should display a high level of competence when they are
free to use attentive phonological decoding. Second, they should show a
lower level of success when they attempt to read nonwords while doing a
secondary task which blocks their use of attentive phonological
decoding and thereby confines their nonword reading to nonattentive
processes. I suggest that this lower level of performance (if it
exists) is the one against which PDP simulations of nonword reading
should be compared as this should reflect only nonattentive nonword
reading -- the phonological ability modeled by PDP simulations of
reading.

Note

1.   It is possible that sentence and other contextual sources of
information are used in accessing oral knowledge following phonological
decoding: the hearing of words is highly context dependent and so I would
expect any "inner ear" identification of words to be likewise. 

References

Balota, D. A., & Chumbley, J. I. (1984). Were are the  effects of frequency
in visual word recognition tasks? Right where we said they were:
Comment on Monsell, Doyle,  and Haggard (1989). Journal of Experiment
Psychology: General, 111, 231-237.

Van Orden, G. C., Stone, G. O. & Pennington, B. F. (1990). Word
identification in reading and the promise of subsymbolic
psycholinguistics. Psychology Review, 97, 488-522.

------------------------------

                             PSYCOLOQUY 
                           is sponsored by 
                     the Science Directorate of 
                the American Psychological Association 
                           (202) 955-7653 

                              Co-Editors:

(scientific discussion)         (professional/clinical discussion)

    Stevan Harnad          Perry London, Dean,     Cary Cherniss (Assoc Ed.)
Psychology Department  Graduate School of Applied   Graduate School of Applied
Princeton University   and Professional Psychology  and Professional Psychology
                            Rutgers University           Rutgers University

                           Assistant Editor:

                             Malcolm Bauer 
                         Psychology Department
                         Princeton University
End of PSYCOLOQUY Digest
******************************