representation of input text in a conversation

Jim Davies jimmyd at cc.gatech.edu
Wed Mar 24 10:37:59 EST 1999


I have come up with how I will represent input text (for now, anyway).
Here is how it works-- for the sentence "How are you" I have the chunks:

 (goal-2-1 
  ISA goal-talk 
  sentence-id 2 
  word how-word 
  next-word goal-2-2)

 (goal-2-2 
  ISA goal-talk 
  sentence-id 2 
  word are-word 
  next-word goal-2-3)

 (goal-2-3 
  ISA goal-talk 
  sentence-id 2 
  word you-word 
  next-word nil)

What are the advantages? 
1) The you can only focus on one goal at a time, so the other chunks 
involved in the sentence are just in memory like any other. To distinguish 
them from words in other sentences, they have an identifier. You can think
of this as a timestamp of some sort like Christian suggested in his
ACT-R/PM  email. This is also how Anderson did it with his "parent" slot.

This is an improvement from my original suggestion, in which the chain of 
words didn't know what sentence they belonged to.  The model needs to know
this, though, so it doesn't confuse the focused sentence with others in
memory.

2) The next-word slot points to a chunk name, rather than
an ordinal position,  as in Anderson's suggestion:

(a3
     word you
     parent s1
     position third) 

The reason I did it this was was because I wanted to be able to look for
consecutive word combinations. For example, to determine if it were a
question, you
might want to look for a verb followed by a noun, like "do you" rather
than "you do." With a chain of chunks, you can do this with one
production.  With the ordinal position, I guess you'd have to go through
the whole list? But perhaps this is best, psychologically. Any comments?

In defence of my position on this:
If you represented the word "monkey" as the 8th element in the sentence
"I have a huge collection of vervet monkey clothing," then you should be
fast on verification that "monkey" is the 8th word. 
This sounds unlikely to me; I don't think we can retrieve that
information. I would conjecture you
would be faster at verifying that "clothing" followed "monkey" or that
"vervet" came just before "monkey."




Following is a perl script that takes in a file of sentences and outputs
chunks in this manner. Feel free to use or modify it. 




#!/usr/local/bin/perl   

# program name: goal-maker.perl
# author: jimmydavies at usa.net (Jim Davies)
# version: 1.0

# Creates a series of chunks representing a sentence for ACT-R
# modeling. It is a part of the primatech project:
#      http://www.cc.gatech.edu/~jimmyd/primatech/
# Each line
# of the input file is a sentence to be represented. Each word
# is put into a seperate chunk. Chunks are named goal-u-v
# where u and v are integers 
# such that u is the sentence identifier (constant with all 
# the words in the sentence) and v is the word identifier. So 
# goal-3-5 means the fifth word in the third sentence.
#
# This program assumes that the chunk name for a word chunk whatever
# is whatever-word, to distinguish it from the whatever-concept, 
# whatever-written-word, etc.
#
# run the program like this from a UNIX command line:
#    goal-maker.perl < filename
#
# If it doesn't work, check to make sure that the path at the top 
# is correct for where your version of perl is installed.
#
# EXAMPLE:
#
# So if the input file has:
#    hello kitty
#    how are you
# 
# then the output of this script will be:
#
# (goal-1-1 
#  ISA goal-talk 
#  sentence-id 1 
#  word hello-word 
#  next-word goal-1-2)
#
# (goal-1-2 
#  ISA goal-talk 
#  sentence-id 1 
#  word kitty-word 
#  next-word nil)
#
# (goal-2-1 
#  ISA goal-talk 
#  sentence-id 2 
#  word how-word 
#  next-word goal-2-2)
#
# (goal-2-2 
#  ISA goal-talk 
#  sentence-id 2 
#  word are-word 
#  next-word goal-2-3)
#
# (goal-2-3 
#  ISA goal-talk 
#  sentence-id 2 
#  word you-word 
#  next-word nil)

# begin script

#initialize
$sentence_id = 1;


# read in the line as one single string
while ($sentence_string = <STDIN>) {

    # split the string into an array of words, splitting at
    # spaces
    @sentence = split(/ /,$sentence_string);
    
    # word_id is the word index in the current sentence
    # initialize.
    $word_id = 1;
    $next_word_id = 2;

    # this part gets the sentence length
    $sentence_length = 0;
    foreach $foo (@sentence){
        $sentence_length++;
    }
    
    # loop through all the words in the sentence
    foreach $word (@sentence){

        #create a start chunk 
        print "(goal-$sentence_id-$word_id \n ISA goal-talk \n sentence-id $sent
ence_id ";
        
        # if this is the last word in the sentence the next chunk will be nil
        if ($word_id == $sentence_length) {
            chop $word;
            print "\n word $word-word \n next-word nil)";
        } 

        # otherwise, link it to the next word
        else {
            print "\n word $word-word \n next-word goal-$sentence_id-$next_word_
id)";
        }
        
        print "\n\n";
        
        $word_id ++;
        $next_word_id ++;
    } # loop on words in the sentence
    
    
    $sentence_id ++;
} # loop on sentences in the file



   






More information about the ACT-R-users mailing list