[ACT-R-users] on similarity and glsa

Roman Belavkin R.Belavkin at mdx.ac.uk
Fri Jun 17 10:40:18 EDT 2005


I have looked at the site, and it states it is using mutual information as a metric, which measures as we know statistical dependence (so, it is not just correlations betwen terms).  Still I am not sure if similarity, as we understand it, and statistical dependene are the same things.  For example, here are the similarities for the word apple:
 
apple mac 0.23059334
apple microsoft 0.21670483
apple dkz 0.21022835
etc...
 
So, the most `similar' word to apple is mac (or how about `dkz'?).  To me, orange or a fruit seems more similar terms.  What these numbers show is a degree of statistical dependence of two terms in the docements analysed.
 
It would be an interesting project for the ACT-R community to investigate this difference.  What do you think?
 
Roman

	-----Original Message----- 
	From: Roman Belavkin 
	Sent: Fri 6/17/2005 15:17 
	To: Kelley, Troy (Civ,ARL/HRED); act-r-users at act-r.psy.cmu.edu 
	Cc: 
	Subject: RE: [SPAM: 4.500] RE: [ACT-R-users] ACT-R output from GLSA server at PARC
	
	
	Hi,
	 
	I think the word `similar' is not really appropriate here.  LSA just shows co-occurance of the two words really, and higher co-occurance for word man can be simply explained because the word man means also human, while woman is more specific and thus less ambiguous.
	 
	Cheers,
	Roman

		-----Original Message----- 
		From: act-r-users-admin at act-r.psy.cmu.edu on behalf of Kelley, Troy (Civ,ARL/HRED) 
		Sent: Thu 6/9/2005 21:52 
		To: act-r-users at act-r.psy.cmu.edu 
		Cc: 
		Subject: [SPAM: 4.500] RE: [ACT-R-users] ACT-R output from GLSA server at PARC
		
		

		Here is a sample output from the GLSA server at PARC for the word Love 

		love    love    0.9999997 
		love    fun     0.16717705 
		love    city    0.103955254 
		love    place   0.2078263 
		love    man     0.36086833 
		love    friend  0.19896048 
		love    neighbor        -0.037641484 
		love    woman   0.21863273 
		love    fondness        0.0076576206 

		Interesting, love is more similar to the word "man" than "woman", and 
		more similar to "man" than "fondness" or "friend". 

		Troy 

		-----Original Message----- 
		From: act-r-users-admin at act-r.psy.cmu.edu 
		[mailto:act-r-users-admin at act-r.psy.cmu.edu] On Behalf Of 
		Raluca.Budiu at parc.com 
		Sent: Thursday, June 09, 2005 3:54 PM 
		To: act-r-users at act-r.psy.cmu.edu 
		Subject: [ACT-R-users] ACT-R output from GLSA server at PARC 



		Some of you may have heard about PARC's effort to build an external 
		GLSA/PMI server; it is now available at: 

		http://glsa.parc.com/ 

		and it produces ACT-R output (other formats are supported as well). 

		GLSA (Generalized Latent Semantic Analysis) is a LSA-like method of 
		computing word similarities, but  it has the advantage of an adjustable, 
		web-based corpus. It takes as input a list of word pairs and provides 
		similarities between those words. 

		Just very recently it started providing ACT-R output; this is an  ACT-R 
		file that defines a meaning chunk type and sets the Sij-s between words 
		to their similarity value as computed by the server.  

		The server is still in a development phase, but please feel free to 
		experiment with it. Comments and suggestions are very welcome. 

		Raluca Budiu 

		_______________________________________________ 
		ACT-R-users mailing list 
		ACT-R-users at act-r.psy.cmu.edu 
		http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users 

		_______________________________________________ 
		ACT-R-users mailing list 
		ACT-R-users at act-r.psy.cmu.edu 
		http://act-r.psy.cmu.edu/mailman/listinfo/act-r-users 

		This message has been checked for viruses but the contents of an attachment 
		may still contain software viruses, which could damage your computer system: 
		you are advised to perform your own checks. Email communications with the 
		University of Nottingham may be monitored as permitted by UK legislation. 





More information about the ACT-R-users mailing list