Discussion II: Reading & Neural Nets

Sun Nov 24 01:12:30 EST 1991

Subject: Discussion I: Reading & Neural Nets

Here is the second of two exchanges concerning the Target Article
on Reading and Connectionism that appeared in PSYCOLOQUY 2.8.4
(retrievable by anonymous ftp from directory pub/harnad on
princeton.edu). Further commentary is invited. All contributions
will be refereed. Please submit to psyc at pucc.bitnet or
psyc at pucc.princeton.edu -- NOT TO THIS LIST.

Subject: PSYCOLOQUY V2 #9 (2.9.4 Commentary: Reilly, Skoyles : 410 lines)
PSYCOLOQUY   ISSN 1055-0143     Sun, 24 Nov 91       Volume 2 : Issue   9.4
      2.9.4.1  Commentary on Skoyles Connectionism, Reading... / Reilly
      2.9.4.2 Reply to Reilly / Skoyles

----------------------------------------------------------------------

From: Ronan Reilly ERC <M160 at eurokom.ie>
Subject: 2.9.4.1  Commentary on Skoyles Connectionism, Reading... / Reilly

  There's More to Connectionism than Feedforward and Backpropogation

           (Commentary on Skoyles Connectionism, Reading and
            the Limits of Cognition PSYCOLOQUY 2.8.4 1991)

                            Ronan Reilly
                      Educational Research Centre
                       St Patrick's College
                             Dublin 9
                             IRELAND
                      ronan_reilly at eurokom.ie

1. Introduction

I think Skoyles has presented a novel idea for modeling the learning
of reading. The main aim of this commentary is to answer some of the
questions he raised in his preamble, particularly those relating to
connectionism, and finally to discuss some work I've done in the area
that may provide a starting point for implementing Skoyles's proposal.

2. The Nature of Connectionist Training

There are, as I'm sure will be pointed out in other commentaries, more
connectionist learning algorithms than error backpropagation and more
connectionist learning paradigms than supervised learning. So I am a
little puzzled by Skoyles's failure to find any research on issues
relating to the nature of error correction feedback. For example, what
about the research on reinforcement learning by Barto, Sutton, and
Anderson (1983)?  In this work, no detailed feedback is provided on
the correctness of the output vector. The teaching signal simply
indicates whether or not the output was correct.

On the issue of delayed error feedback: In order to deal with temporal
disparities between input and error feedback, the network has to
incorporate some form of memory that preserves sequential information.
A standard feedforward network obviously has a memory, but it is one in
which the temporal aspect of the input is discarded. Indeed, modelers
usually go out of their way to discourage any temporal artifacts in
training by randomising the order of input. Elman (1990) devised a
simple technique for giving feedforward networks a temporal memory. It
involves taking a copy of the activation pattern of the hidden units at
time t and using it as input at time t+1, in addition to whatever
other input there might be. The weights connecting these copy units (or
context units) to the hidden units are themselves modifiable, just like
the other weights in the network. Consequently, these weights accrete
information about the input sequence in diminishing amounts over a
number of preceding time steps. In these simple recurrent networks it
is possible, therefore, for the input at time t to affect the output of
the network at time t+n, for relatively small n. The corollary to this
is that it is possible for error feedback to have an effect on learning
at a temporal remove from the input to which it relates.

Degraded error feedback is not a problem either. A number of
connectionist paradigms have made use of so-called "moving target"
learning. This occurs when the teaching vector (and even the input
vector) are themselves modified during training. The most recent
example of this is the recursive auto-associative memory (RAAM) of
Pollack (1990). I won't dwell on the ins and outs of RAAMs, but suffice
to say that a key element in the training of such networks is the use
of their own hidden unit vectors as both input and teaching vectors.
Thus, the network is confronted with a very complex learning task,
since every time its weights are changed, the input and teaching
vectors also change. Nevertheless, networks such as these are capable
of learning successfully. In many ways, the task of the RAAM network is
not unlike that of the individual learning to read as characterized by
Skoyles.

My final word on the topic of connectionist learning algorithms
concerns their psychological status. I think it is important to
emphasise that many aspects of backpropagation learning are
psychologically unrealistic. Apart from the fact that the algorithm
itself is biologically implausible, the level of specificity required
of the teacher is just not found in most psychological learning
contexts. Furthermore, the randomized nature of the training regime and
the catastrophic interference that occurs when a network is trained on
new associations does not correspond to many realistic learning
situations (if any). What is important about connectionist learning is
not the learning as such, but what gets learned. It is the nature of
the representations embodied in the weight matrix of a network that
gives connectionist models their explanatory and predictive power.

3. Phonetic Reading

In what follows, I assume that what Skoyles means by "phonetic reading"
is phonologically mediated access to word meaning. I don't think it is
yet possible to say that phonology plays no role in accessing the
meaning of a word. However, Seidenberg (1989) has argued persuasively
that much of the evidence in favor of phonological mediation can be
accounted for by the simultaneous activation of both orthographic and
phonological codes, and none of the evidence addresses the central
issue of whether or not access is mediated by phonological codes.
Personally, I am inclined to the view that access to meaning among
skilled readers is direct from the orthography.

I was puzzled by Skoyles's description of the Seidenberg and McClelland
(1989) model, first, as a model of reading, and second, as a model of
non-phonetic reading. It certainly is not a model of reading, since in
the implementation they discuss there is no access to meaning.
Furthermore, how can it be considered to be nonphonetic when part of
the model's training involves teaching it to pronounce words?  In fact,
Seidenberg and McClelland's model seems to be a red herring in the
context of the issues Skoyles wishes to address.

4. A Modelling Framework

I am currently working on modeling the role of phonics in teaching
reading using a connectionist framework (Reilly, 1991). The model I've
developed might provide a suitable framework for addressing Skoyles's
hypothesis. It consists of two components, a speech component which is
trained first and learns to map a sequence of phonemes onto a lexical
representation. The weights in this network are frozen after training.
The second component is a network that maps an orthographic
representation onto a lexical representation. This mapping can be
either via the hidden units in the speech module (i.e., the
phonological route), via a separate set of hidden units (i.e., the
direct route), or via both sets of hidden units. I have operationalized
different teaching emphases (e.g., phonics vs. whole- word vs. a mixed
approach) by allowing or disallowing the training of the weights
comprising the two lexical access routes.

Preliminary results suggest that a mixed approach gives the best
overall word recognition performance, but this has not proved entirely
reliable over replications of training with different initial weight
settings. I am currently working on various refinements to the model.

In addition to providing a testbed for the phonics issue, the model
I've outlined might also provide a framework for implementing Skoyles's
idea, and it might perhaps help derive some testable hypotheses from it.
For example, it would be possible to use the lexical output produced
as a result of taking the phonological route as a teaching signal for the
direct route. I imagine that this might give rise to distinct forms of
word recognition error, forms not found if a "correct" teaching signal
were used.

5. Conclusion

I think that Skoyles's idea is interesting and worthy of exploration. I
feel, however, that his view of current connectionist modeling is
somewhat narrow. Contrary to the impression he appears to have,
there are connectionist learning architectures and techniques available
that address many of the issues he raises.

6. References

Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuron-like
elements that can solve difficult learning control problems. IEEE
Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14,
179-211.

Pollack, J. B. (1990). Recursive distributed representations.
Artificial Intelligence, 46, I77-105.

Reilly, R. (1991). A Connectionist exploration of the phonics issue in
the teaching of reading: Re-using internal representations. In Working
notes of AAAI Spring Symposium on connectionist natural language
processing. March, 1991, Stanford University, pp. 178-182.

Seidenberg, M. S. (1989). Visual word recognition and pronunciation. In
W. Marslen-Wilson (Ed.), Lexical representation and process. Cambridge,
MA: MIT Press, pp. 25-74.

Seidenberg, M. S., & McClelland, J. L. (1989). A distributed
developmental model of visual word recognition.
Psychological Review, 96, 523-568.

------------------------------

From: John R Skoyles <ucjtprs at ucl.ac.uk>
Subject: 2.9.4.2 Reply to Reilly / Skoyles

          The Limits of Connectionism and Cognition Revisited:
    	     All Reading Networks need to be Trained
	           (Reply to Reilly)

                   John R. Skoyles
                Department of Psychology
                University College London
                   London WC1E 6BT
                   ucjtprs at ucl.ac.uk

1. The nature of connectionist learning.

My argument against connectionist reading consists of two points.
First, reading networks do not get a "free meal" -- something for
nothing (Skoyles, 1988). To be able to read they have to be
trained. To parallel the popular cliche, "junk in, junk out" reading
networks depend upon "mappings in, mappings out." What is called
"reading" in these networks is a mapping usually from a written word to
its pronunciation (but potentially also to its meaning). To get to that
state, however, they need to be trained on exemplar mappings -- the
reading network does not get the information to make its mappings
miraculously from nowhere but from mappings previous given to it.
Error-correction is one way of doing exemplar training -- the network
can only make an error in the context of a correct output for a given
input. (Of course, reading networks create new mappings not given to
them, but the information to do so derives from mappings with which
they have been previously trained. So in a sense there is a free meal,
but the network has to be feed something first.)

Second, the proponents of reading networks maintain that there is a free meal by
focusing entirely upon the "mappings out," forgetting where they get the
"mappings in" to train them. Instead of miracles, I suggest that phonological
reading -- identifying a written word -- provides this information. This
conjecture fits in with the evidence about phonological reading and learner
readers and dyslexia. Improving the skill of a learner reader to identify
words from their spelling enhances their progress in learning to read (Adams,
1990). Dyslexics lack phonological abilities and so find it difficult to
identify words from their spelling (Snowling, 1987). These two facts make
sense if the phonological identification of words is providing the "mappings
in" to train the reading network.

1.1. Supervised learning.

In my target article (Skoyles 1991) I discussed McClelland and
Seidenberg's (1989) model of reading, which uses supervised
backpropagation learning. Reilly correctly points out that there is
more to connectionism than backpropagation and supervised learning.

I focused upon these because they are used by the published models of
reading. This does not diminish the generality of my points. For
example, Reilly correctly points out that Barto, Sutton, and Anderson
(1983) have proposed a model of reinforcement training which contains
no detailed information about the correctness of the output vector.
As Reilly points out, however, they nonetheless use a teaching signal that
indicates whether or not the output was correct. But how would a system
training a network know whether or not its output was correct 
without some independent means of recognising words? My point applies
not only to backpropagation but any form of supervised learning (because to
tutor the network the supervisor has to know something the network does not).

1.2. Unsupervised learning.

My point also applies to unsupervised networks -- for example Boltzmann
nets. These are given inputs and are apparently not corrected. There is
a stage in Boltzmann training, however, when the network's actual and
desired output are calculated to form the objective function, and
depending upon this the internal weights in the network are or are not
retained. Thus, this unsupervised learning still begs the question of
the availability of knowledge regarding the desired output of the
network: Without this the objective function cannot be calculated.
Although the network may be unsupervised, it is not unregulated. It is
given exemplar input and desired outputs. In the case of reading, the
desired output will be the correct reading of a written word (its
input). But the Boltzmann network cannot by itself know that any
reading is correct and hence desired: Something outside the system has
to be able to read to do this. In other words the same situation I
showed existed with supervised networks, exists for unsupervised ones.

1.3. Auto-associative learning.

Reilly raises the possibility of auto-associative learning. Networks
using this do not have to feed on information in the form of
error-correction nor do they have to correct exemplar input-output
pairs supplied from outside because their input doubles as their
desired output. I would question, however, whether a network dependent
entirely upon auto-associative learning could learn to read. This may
work well with categorization skills, but as far as I am aware, not
with mapping tasks (such as reading) which involve learning a
vocabulary. I would be very interested to see whether anyone can create
such a net. Of course, there is no reason a network may not use
auto-associative learning in combination with non-autoassociative
training.

2. Biological plausibility.

I agree with Reilly's observation that backpropagation is biologically
implausible. However, new learning procedures have been developed which
are biologically feasible (Mazzoni, Anderson & Jordan, 1991). In addition,
as noted above, my observation is a general one, which would apply
much more widely than just to the cases of backpropagation and
supervised learning. Although it is unlikely the networks responsible for
reading in the brain use backpropagation, it is likely that they are
constrained by the same kind of constraints noted above and in
my original target article.

3. Network learning vs the internal representations of networks as
objects of interest.

I am slightly concerned that Reilly suggests "What is important about
connectionist leaning is not the learning as such, but what gets
learned. It is the nature of the representations embodied in the weight
matrix of a network that gives connectionist model their explanatory
and predictive power." This seems an abdication of responsibility.
Connectionist models are described as learning models, not
representation models. Their authors emphasis that their training is
not an incidental means for their creation but something that might
enlighten us about the process by which networks are acquired. Reilly's
own simulation of reading is concerned not with what gets learnt but
with which of three reading instruction methods (whole word, phonic or
mixed whole word and phonics) trains reading networks best. In
addition, if we get the mechanism by which networks develop wrong, can
we be confident that their internal representations are going to be
correct, and consequently of interest?

4. Phonetic reading.

As I note in my accompanying reply to Coltheart (1991; Skoyles 1991),
phonetic reading can mean two things. First, phonological decoding --
something measured by the ability to read nonwords. Second, the
identification of written words using information about how they are
spelt and orally pronounced. In the latter, a reader uses some kind of
phonological decoding to access oral vocabulary to identify words -- so
they are associated. However, phonological decoding may be done through
several means -- lexical analogies and even to some extent through the
reading network (see my comments on this in my reply to Coltheart
1991). But whereas a reading network can phonological decode words, it
cannot recognise words by accessing the knowledge we have of how they
are pronounced in oral vocabularies. Access to that information through
phonological decoding is the critical thing I suggest for training
networks -- not the phonological decoding involved.

Reilly right points out that Seidenberg and McClelland's (1989) model does not
fully cover all aspects of reading, in particular, access to meaning. However,
my observations would generalise to reading models which cover this. This is
because my observation is about input/output mapping and it does not matter
if the output is not phonology but meaning. In this case, phonological reading
accesses the meaning of words from oral vocabulary, which is then used to
train the semantic output of the reading network. I did not develop this
point simply because Seidenberg and McClelland's model, as Reilly notes, does
not cover meaning.

5. Reilly's own model

Reilly only briefly describes his own contribution to understanding how
reading networks develop. I am very interested in his suggestion that
"it would be possible to use the lexical output produced as a result of
taking the phonological route as a teaching signal for the direct
route." As he notes, this might produce "distinct forms of word
recognition error." Experiments in this area seem a good idea, though
perhaps Reilly's network needs to be refined (he notes that it is not
entirely reliable over replications with different initial weights). I
would like to see whether his "phonic" route could take more account of the
possibility of using units of pronunciation correspondence larger than
the phoneme-grapheme one, because children seem to start with larger
ones (the sort used in lexical analogies).

6. Conclusion

Reilly suggests that learning architectures and techniques are available
which address the issues I raised in my original target article. With the
exception of Reilly's own model, as I hope I have shown above, this is not true.

7. References

Adams, M. J. (1990). Beginning to read: Thinking and learning about print.
Cambridge, MA: MIT Press.

Coltheart, M. (1991) Connectionist modeling of human language processing: The
case of reading. PSYCOLOQUY 2.9.3.

Mazzoni, P., Anderson, R. A. & Jordan, M. I. (1991). A more biologically
plausible learning rule for neural networks. Proceedings of the National
Academy of Sciences USA. 88, 4433-4437.

Seidenberg, M. S. and McClelland, J. I. (1989). A distributed, developmental
model of word recognition and naming, Psychological Review, 96, 523-568.

Skoyles, J. R. (1988) Training the brain using neural-network models. Nature
333, 401.

Skoyles, J. R. (1991) Connectionism, reading and the limits of cognition.
PSYCOLOQUY 2.8.4.

Skoyles, J.R. (1991) The success of PDP and the dual route model: Time to
rethink the phonological route. PSYCOLOQUY 2.9.3.

Snowling, M. (1987). Dyslexia: A cognitive developmental perspective. Oxford:
Basil Blackwell.

------------------------------

                             PSYCOLOQUY 
                           is sponsored by 
                     the Science Directorate of 
                the American Psychological Association 
                           (202) 955-7653 

                              Co-Editors:

(scientific discussion)         (professional/clinical discussion)

    Stevan Harnad          Perry London, Dean,     Cary Cherniss (Assoc Ed.)
Psychology Department  Graduate School of Applied   Graduate School of Applied
Princeton University   and Professional Psychology  and Professional Psychology
                            Rutgers University           Rutgers University

                           Assistant Editor:

                             Malcolm Bauer 
                         Psychology Department
                         Princeton University
End of PSYCOLOQUY Digest
******************************