[ACT-R-users] similarity

Peter Pirolli pirolli at parc.com
Fri Jun 17 12:50:05 EDT 2005


There may be some confusion over notions of similarity, so let me try to 
help. I would like to second Roy's pointer to the  Manning and Schuetze 
book: It is an excellent introduction to Pointwise Mutual Information (PMI) 
and LSA (it's an excellent introduction to just about every basic concept 
and technique in statistical natural language processing). As I point out 
in the Pirolli (2005) Cognitive Science paper, PMI is approximately the 
same thing as association strength in ACT-R. PMI (like LSA) is very good at 
generating scores that correlate well with synonym judgements (e.g., as 
tested by the TOEFL test), which might be one way to define "similarity." 
The PMI scores in the GLSA server are used to set distances among word 
items that feed into the LSA computation (Ayman Farahat argues that this 
provides better results on a number of tests).

The GLSA scores are dependent on the particulars of the document corpus 
over which they are computed, and we're interested in improving that corpus.

--Pete


At 08:15 AM 6/17/2005 -0700, Roy Wilson wrote:
>On Friday 17 June 2005 10:59, Roman Belavkin wrote:
> > I have looked at the site, and it states it is using mutual information as
> > a metric, which measures as we know statistical dependence (so, it is not
> > just correlations betwen terms).  Still I am not sure if similarity, as we
> > understand it, and statistical dependene are the same things.  For example,
> > here are the similarities for the word apple:
> >
> > apple mac 0.23059334
> > apple microsoft 0.21670483
> > apple dkz 0.21022835
> > etc...
> >
> > So, the most `similar' word to apple is mac (or how about `dkz'?).  To me,
> > orange or a fruit seems more similar terms.  What these numbers show is a
> > degree of statistical dependence of two terms in the docements analysed.
> >
> > It would be an interesting project for the ACT-R community to investigate
> > this difference.  What do you think?
>
>I don't know what the answer is, but Manning and Schutze have an fairly
>extensive discussion of these issues in (I think) "Statistical Foundations of
>Natural Language Processing" (MIT Press, 2000).
>
>
> >
> > Roman
> >
> >       -----Original Message-----
> >       From: act-r-users-admin at act-r.psy.cmu.edu on behalf of Roman Belavkin
> >       Sent: Fri 6/17/2005 15:17
> >       To: Kelley, Troy (Civ,ARL/HRED); act-r-users at act-r.psy.cmu.edu
> >       Cc:
> >       Subject: RE: [SPAM: 4.500] RE: [ACT-R-users] ACT-R output from GLSA
> > server at PARC
> >
> >
> >
> >       Hi,
> >
> >       I think the word `similar' is not really appropriate here.  LSA 
> just shows
> > co-occurance of the two words really, and higher co-occurance for word man
> > can be simply explained because the word man means also human, while woman
> > is more specific and thus less ambiguous.
> >
> >
> >       Cheers,
> >       Roman
> >
> >               -----Original Message-----
> >               From: act-r-users-admin at act-r.psy.cmu.edu on behalf of 
> Kelley,
> > Troy (Civ,ARL/HRED) Sent: Thu 6/9/2005 21:52
> >               To: act-r-users at act-r.psy.cmu.edu
> >               Cc:
> >               Subject: [SPAM: 4.500] RE: [ACT-R-users] ACT-R output 
> from GLSA
> > server at PARC
> >
> >
> >
> >               Here is a sample output from the GLSA server at PARC for the 
> word
> > Love
> >
> >               love    love    0.9999997
> >               love    fun     0.16717705
> >               love    city    0.103955254
> >               love    place   0.2078263
> >               love    man     0.36086833
> >               love    friend  0.19896048
> >               love    neighbor        -0.037641484
> >               love    woman   0.21863273
> >               love    fondness        0.0076576206
> >
> >               Interesting, love is more similar to the word "man" than 
> "woman",
> > and more similar to "man" than "fondness" or "friend".
> >
> >               Troy
> >
> >               -----Original Message-----
> >               From: act-r-users-admin at act-r.psy.cmu.edu
> >               [mailto:act-r-users-admin at act-r.psy.cmu.edu] On Behalf Of
> >               Raluca.Budiu at parc.com
> >               Sent: Thursday, June 09, 2005 3:54 PM
> >               To: act-r-users at act-r.psy.cmu.edu
> >               Subject: [ACT-R-users] ACT-R output from GLSA server at PARC
> >
> >
> >
> >               Some of you may have heard about PARC's effort to build an
> > external GLSA/PMI server; it is now available at:
> >
> >               http://glsa.parc.com/
> >
> >               and it produces ACT-R output (other formats are supported as
> > well).
> >
> >               GLSA (Generalized Latent Semantic Analysis) is a LSA-like 
> method
> > of computing word similarities, but  it has the advantage of an adjustable,
> > web-based corpus. It takes as input a list of word pairs and provides
> > similarities between those words.
> >
> >               Just very recently it started providing ACT-R output; 
> this is an
> > ACT-R file that defines a meaning chunk type and sets the Sij-s between
> > words to their similarity value as computed by the server.
> >
> >               The server is still in a development phase, but please 
> feel free
> > to experiment with it. Comments and suggestions are very welcome.
> >
> >               Raluca Budiu
> >
> >               _______________________________________________
> >               ACT-R-users mailing list
> >               ACT-R-users at act-r.psy.cmu.edu
> >               http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users
> >
> >               _______________________________________________
> >               ACT-R-users mailing list
> >               ACT-R-users at act-r.psy.cmu.edu
> >               http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users
> >
> >               This message has been checked for viruses but the 
> contents of an
> > attachment may still contain software viruses, which could damage your
> > computer system: you are advised to perform your own checks. Email
> > communications with the University of Nottingham may be monitored as
> > permitted by UK legislation.
> >
> >
> >       _______________________________________________
> >       ACT-R-users mailing list
> >       ACT-R-users at act-r.psy.cmu.edu
> >       http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users
> >
> >
> > _______________________________________________
> > ACT-R-users mailing list
> > ACT-R-users at act-r.psy.cmu.edu
> > http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users
>
>--
>Roy Wilson
>Learning Research Development Center
>University of Pittsburgh
>webpage: www.pitt.edu/~rwilson
>email: rwilson at pitt.edu
>
>_______________________________________________
>ACT-R-users mailing list
>ACT-R-users at act-r.psy.cmu.edu
>http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users





More information about the ACT-R-users mailing list