Error gradients through associative learning

Tue Sep 28 12:15:22 EDT 1999

At the workshop this summer I was intrigued at how partial matching
crept kudzu-like into so many conversations, and this led me to think
some more about errors and how they might occur as a function of
experience with the environment.  Below is an introduction to a
positional-confusion model that I hope to present more formally soon.
Comments, criticisms, bomb threats, all are welcome.

Erik.

-----------------------
Erik M. Altmann
Psychology 2E5
George Mason University
Fairfax, VA  22030
703-993-1326
altmann at gmu.edu
hfac.gmu.edu/~altmann
-----------------------

Motivations

(1) Account for error gradients within the ACT-R theory.  By "within
    the theory" I exclude ACT-R's partial matching mechanism, which is
    less an explanation than a representation or re-description of the
    effect, which and is isolated from other theoretical constraints
    (in particular, learning).

(2) Improve on Estes's perturbation model, another popular model of
    error gradients that fails to address the construction of the
    underlying memory representation.  The perturbation model assumes
    that memory is organized as an array along the dimension of
    distortion, presupposing a higher-order memory structure without
    explaining how it might arise.  This is misleading in the sense
    that it suggests implicitly that multi-dimensional arrays are a
    natural way to organize human memory (if it were so, perhaps we
    would have had ARRP instead of LISP).

(3) Test some of my serial-attention assumptions, including frequent
    refresh of the goal chunk from declarative memory, a limited goal
    stack, and no effective retrieval threshold (meaning that memory
    always returns something, so that interference becomes as
    important as decay in determining error).

(4) Link error gradients, as one particular class of error, back to
    patterns of interaction with the environment, with the hope of
    making predictions about errors in interactive behavior.

Current results

(1) An associative-learning model that reproduces positional gradients
    and serial position curves in memory for order.  A study by Nairne
    (1992) tested implicit memory for order with retention intervals
    ranging from 30 sec to 24 hours.  The model fits Nairne's data
    with R^2 = .96 and RMSE = 3.7% over 75 data points, improving both
    on his application of the perturbation model and on the ACT-R
    partial matching model reported in Anderson & Matessa (1997).

(2) The AL model acquires the memory representation underlying the
    error pattern.  That is, it does the whole task, from presentation
    through retention to test.  Perturbation and partial matching
    don't address how the underlying representations might be acquired
    in the first place.

(3) The AL model is distinguished from the perturbation model by
    predictions about primacy and recency.  Perturbation seems to
    predict primacy equal to recency, though at one point it predicted
    primacy less than recency (Lee & Estes, 1977).  In contrast, the
    AL model predicts primacy greater than recency, because items are
    linked associatively in the forward direction, and later items are
    more likely to be preceded by bad links.  It turns out that
    primacy is greater than recency in Nairne's conditions and
    elsewhere (e.g., Healy, 1971, cited in Estes 1997).  However, the
    difference isn't discussed and no statistical tests are reported.

(4) It's not clear that the partial-matching mechanism explains much
    variance in the cognitive arithmetic data, either.  The Siegler in
    Chapter 3 actually improves slightly if one turns off partial
    matching and removes the production syntax that constrains chunk
    retrieval.  (See Files, below.)  I haven't tested this change in
    the more complex models in Chapter 9.

Summary of processing in the AL model

  At presentation

    1. "Move attention" to a new item.  This adds two chunks to
       memory, a cue and a target.  The cue can be intepreted as a
       positional code and the target as the item itself, but this is
       somewhat arbitrary; the point is that these two chunks make up
       the current item.

    2. Retrieve the cue from memory, with the previous target in the
       focus of attention.  This creates a link from the previous
       target to the current cue.

    3. Retrieve the target from memory, with the cue in the focus of
       attention.  This creates a link from the current cue to the
       current target.

    4. Go to 1.

  At recall

    1. Retrieve an unretrieved cue, with the previous target in the
       focus of attention as an activation source.  People know what's
       "unretrieved" because all presented items and all retrieved
       items are visible.

    2. Retrieve an unretrieved target, with the cue in the focus of
       attention as an activation source.

    3. Go to 1.

Emergence of effects in the AL model

    Positional gradients are a function of retrieval errors on the cue
    or target, on steps 2 or 3, respectively, during presentation.  A
    retrieval error creates an association error, or a link from a cue
    to the wrong target (or target to wrong cue).  More recent chunks
    are more active, so more likely to intrude as retrieval errors.
    The resulting pattern of association errors, in which near
    neighbors are more often incorrectly linked than far neighbors,
    causes more near misses than far misses at recall time.

    Reduced accuracy over time is a function of retrieval errors at
    recall that are due to decay and to increased interference from
    other chunks.  Thus the gradients themselves don't decay, which
    they don't seem to in people (Nairne's 24 hour condition).

    The bow in the serial position curve arises because elements in
    the middle of the list have more near neighbors than elements at
    the ends of the list, so there is more opportunity for incorrect
    associations to govern retrieval at recall.

    Primacy is greater than recency because chunks are linked in a
    forward direction, and the probability of error is cumulative as
    one moves through the this, at presentation and at recall.  The
    effect is to rotate the serial position curve clockwise slightly.

Other assumptions

    1. Something is always retrieved at recall time, reflecting the
       forced-choice nature of the task.  If an "episodic" chunk for a
       cue or target is not retrieved, then a "semantic" chunk for
       that cue or target is retrieved instead.  The difference is
       that the episodic chunk is linked into the associative chain
       and the semantic chunk is not.  If a semantic chunk for target
       T (or cue C) is retrieved, then T (or C) is "used up" so the
       episodic chunk will not be retrieved (on that trial).

    2. Cognition produces chunks at some relatively constant rate, so
       retention should represented both in terms of decay (of cue and
       target chunks) and in terms of increased interference from
       other chunks.  Each chunk adds a term to the denominator of the
       chunk-choice equation.  However, that many chunks are expensive
       to compute over, and one gets essentially the same effect by
       increasing noise linearly over time.

Files

    hfac.gmu.edu/people/altmann/nairne.txt      Associative learning model
    hfac.gmu.edu/people/altmann/nairne.xl       Model fits

    hfac.gmu.edu/people/altmann/siegler-ema.txt   Modified Siegler model
    hfac.gmu.edu/people/altmann/siegler-ema.xl    Model fits