Error gradients through associative learning
ERIK M. ALTMANN
altmann at gmu.edu
Tue Sep 28 12:15:22 EDT 1999
At the workshop this summer I was intrigued at how partial matching
crept kudzu-like into so many conversations, and this led me to think
some more about errors and how they might occur as a function of
experience with the environment. Below is an introduction to a
positional-confusion model that I hope to present more formally soon.
Comments, criticisms, bomb threats, all are welcome.
Erik.
-----------------------
Erik M. Altmann
Psychology 2E5
George Mason University
Fairfax, VA 22030
703-993-1326
altmann at gmu.edu
hfac.gmu.edu/~altmann
-----------------------
Motivations
(1) Account for error gradients within the ACT-R theory. By "within
the theory" I exclude ACT-R's partial matching mechanism, which is
less an explanation than a representation or re-description of the
effect, which and is isolated from other theoretical constraints
(in particular, learning).
(2) Improve on Estes's perturbation model, another popular model of
error gradients that fails to address the construction of the
underlying memory representation. The perturbation model assumes
that memory is organized as an array along the dimension of
distortion, presupposing a higher-order memory structure without
explaining how it might arise. This is misleading in the sense
that it suggests implicitly that multi-dimensional arrays are a
natural way to organize human memory (if it were so, perhaps we
would have had ARRP instead of LISP).
(3) Test some of my serial-attention assumptions, including frequent
refresh of the goal chunk from declarative memory, a limited goal
stack, and no effective retrieval threshold (meaning that memory
always returns something, so that interference becomes as
important as decay in determining error).
(4) Link error gradients, as one particular class of error, back to
patterns of interaction with the environment, with the hope of
making predictions about errors in interactive behavior.
Current results
(1) An associative-learning model that reproduces positional gradients
and serial position curves in memory for order. A study by Nairne
(1992) tested implicit memory for order with retention intervals
ranging from 30 sec to 24 hours. The model fits Nairne's data
with R^2 = .96 and RMSE = 3.7% over 75 data points, improving both
on his application of the perturbation model and on the ACT-R
partial matching model reported in Anderson & Matessa (1997).
(2) The AL model acquires the memory representation underlying the
error pattern. That is, it does the whole task, from presentation
through retention to test. Perturbation and partial matching
don't address how the underlying representations might be acquired
in the first place.
(3) The AL model is distinguished from the perturbation model by
predictions about primacy and recency. Perturbation seems to
predict primacy equal to recency, though at one point it predicted
primacy less than recency (Lee & Estes, 1977). In contrast, the
AL model predicts primacy greater than recency, because items are
linked associatively in the forward direction, and later items are
more likely to be preceded by bad links. It turns out that
primacy is greater than recency in Nairne's conditions and
elsewhere (e.g., Healy, 1971, cited in Estes 1997). However, the
difference isn't discussed and no statistical tests are reported.
(4) It's not clear that the partial-matching mechanism explains much
variance in the cognitive arithmetic data, either. The Siegler in
Chapter 3 actually improves slightly if one turns off partial
matching and removes the production syntax that constrains chunk
retrieval. (See Files, below.) I haven't tested this change in
the more complex models in Chapter 9.
Summary of processing in the AL model
At presentation
1. "Move attention" to a new item. This adds two chunks to
memory, a cue and a target. The cue can be intepreted as a
positional code and the target as the item itself, but this is
somewhat arbitrary; the point is that these two chunks make up
the current item.
2. Retrieve the cue from memory, with the previous target in the
focus of attention. This creates a link from the previous
target to the current cue.
3. Retrieve the target from memory, with the cue in the focus of
attention. This creates a link from the current cue to the
current target.
4. Go to 1.
At recall
1. Retrieve an unretrieved cue, with the previous target in the
focus of attention as an activation source. People know what's
"unretrieved" because all presented items and all retrieved
items are visible.
2. Retrieve an unretrieved target, with the cue in the focus of
attention as an activation source.
3. Go to 1.
Emergence of effects in the AL model
Positional gradients are a function of retrieval errors on the cue
or target, on steps 2 or 3, respectively, during presentation. A
retrieval error creates an association error, or a link from a cue
to the wrong target (or target to wrong cue). More recent chunks
are more active, so more likely to intrude as retrieval errors.
The resulting pattern of association errors, in which near
neighbors are more often incorrectly linked than far neighbors,
causes more near misses than far misses at recall time.
Reduced accuracy over time is a function of retrieval errors at
recall that are due to decay and to increased interference from
other chunks. Thus the gradients themselves don't decay, which
they don't seem to in people (Nairne's 24 hour condition).
The bow in the serial position curve arises because elements in
the middle of the list have more near neighbors than elements at
the ends of the list, so there is more opportunity for incorrect
associations to govern retrieval at recall.
Primacy is greater than recency because chunks are linked in a
forward direction, and the probability of error is cumulative as
one moves through the this, at presentation and at recall. The
effect is to rotate the serial position curve clockwise slightly.
Other assumptions
1. Something is always retrieved at recall time, reflecting the
forced-choice nature of the task. If an "episodic" chunk for a
cue or target is not retrieved, then a "semantic" chunk for
that cue or target is retrieved instead. The difference is
that the episodic chunk is linked into the associative chain
and the semantic chunk is not. If a semantic chunk for target
T (or cue C) is retrieved, then T (or C) is "used up" so the
episodic chunk will not be retrieved (on that trial).
2. Cognition produces chunks at some relatively constant rate, so
retention should represented both in terms of decay (of cue and
target chunks) and in terms of increased interference from
other chunks. Each chunk adds a term to the denominator of the
chunk-choice equation. However, that many chunks are expensive
to compute over, and one gets essentially the same effect by
increasing noise linearly over time.
Files
hfac.gmu.edu/people/altmann/nairne.txt Associative learning model
hfac.gmu.edu/people/altmann/nairne.xl Model fits
hfac.gmu.edu/people/altmann/siegler-ema.txt Modified Siegler model
hfac.gmu.edu/people/altmann/siegler-ema.xl Model fits
More information about the ACT-R-users
mailing list