[ACT-R-users] similarity

Roy Wilson rwilson+ at pitt.edu
Fri Jun 17 11:15:46 EDT 2005


On Friday 17 June 2005 10:59, Roman Belavkin wrote:
> I have looked at the site, and it states it is using mutual information as
> a metric, which measures as we know statistical dependence (so, it is not
> just correlations betwen terms).  Still I am not sure if similarity, as we
> understand it, and statistical dependene are the same things.  For example,
> here are the similarities for the word apple:
>
> apple mac 0.23059334
> apple microsoft 0.21670483
> apple dkz 0.21022835
> etc...
>
> So, the most `similar' word to apple is mac (or how about `dkz'?).  To me,
> orange or a fruit seems more similar terms.  What these numbers show is a
> degree of statistical dependence of two terms in the docements analysed.
>
> It would be an interesting project for the ACT-R community to investigate
> this difference.  What do you think?

I don't know what the answer is, but Manning and Schutze have an fairly 
extensive discussion of these issues in (I think) "Statistical Foundations of 
Natural Language Processing" (MIT Press, 2000).


>
> Roman
>
> 	-----Original Message-----
> 	From: act-r-users-admin at act-r.psy.cmu.edu on behalf of Roman Belavkin
> 	Sent: Fri 6/17/2005 15:17
> 	To: Kelley, Troy (Civ,ARL/HRED); act-r-users at act-r.psy.cmu.edu
> 	Cc:
> 	Subject: RE: [SPAM: 4.500] RE: [ACT-R-users] ACT-R output from GLSA
> server at PARC
>
>
>
> 	Hi,
>
> 	I think the word `similar' is not really appropriate here.  LSA just shows
> co-occurance of the two words really, and higher co-occurance for word man
> can be simply explained because the word man means also human, while woman
> is more specific and thus less ambiguous.
>
>
> 	Cheers,
> 	Roman
>
> 	        -----Original Message-----
> 	        From: act-r-users-admin at act-r.psy.cmu.edu on behalf of Kelley,
> Troy (Civ,ARL/HRED) Sent: Thu 6/9/2005 21:52
> 	        To: act-r-users at act-r.psy.cmu.edu
> 	        Cc:
> 	        Subject: [SPAM: 4.500] RE: [ACT-R-users] ACT-R output from GLSA
> server at PARC
>
>
>
> 	        Here is a sample output from the GLSA server at PARC for the word
> Love
>
> 	        love    love    0.9999997
> 	        love    fun     0.16717705
> 	        love    city    0.103955254
> 	        love    place   0.2078263
> 	        love    man     0.36086833
> 	        love    friend  0.19896048
> 	        love    neighbor        -0.037641484
> 	        love    woman   0.21863273
> 	        love    fondness        0.0076576206
>
> 	        Interesting, love is more similar to the word "man" than "woman",
> and more similar to "man" than "fondness" or "friend".
>
> 	        Troy
>
> 	        -----Original Message-----
> 	        From: act-r-users-admin at act-r.psy.cmu.edu
> 	        [mailto:act-r-users-admin at act-r.psy.cmu.edu] On Behalf Of
> 	        Raluca.Budiu at parc.com
> 	        Sent: Thursday, June 09, 2005 3:54 PM
> 	        To: act-r-users at act-r.psy.cmu.edu
> 	        Subject: [ACT-R-users] ACT-R output from GLSA server at PARC
>
>
>
> 	        Some of you may have heard about PARC's effort to build an
> external GLSA/PMI server; it is now available at:
>
> 	        http://glsa.parc.com/
>
> 	        and it produces ACT-R output (other formats are supported as
> well).
>
> 	        GLSA (Generalized Latent Semantic Analysis) is a LSA-like method
> of computing word similarities, but  it has the advantage of an adjustable,
> web-based corpus. It takes as input a list of word pairs and provides
> similarities between those words.
>
> 	        Just very recently it started providing ACT-R output; this is an 
> ACT-R file that defines a meaning chunk type and sets the Sij-s between
> words to their similarity value as computed by the server.
>
> 	        The server is still in a development phase, but please feel free
> to experiment with it. Comments and suggestions are very welcome.
>
> 	        Raluca Budiu
>
> 	        _______________________________________________
> 	        ACT-R-users mailing list
> 	        ACT-R-users at act-r.psy.cmu.edu
> 	        http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users
>
> 	        _______________________________________________
> 	        ACT-R-users mailing list
> 	        ACT-R-users at act-r.psy.cmu.edu
> 	        http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users
>
> 	        This message has been checked for viruses but the contents of an
> attachment may still contain software viruses, which could damage your
> computer system: you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
>
>
> 	_______________________________________________
> 	ACT-R-users mailing list
> 	ACT-R-users at act-r.psy.cmu.edu
> 	http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users
>
>
> _______________________________________________
> ACT-R-users mailing list
> ACT-R-users at act-r.psy.cmu.edu
> http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users

-- 
Roy Wilson
Learning Research Development Center
University of Pittsburgh
webpage: www.pitt.edu/~rwilson
email: rwilson at pitt.edu




More information about the ACT-R-users mailing list