[Research] Fwd: Thesis Proposal - Ajit Singh - 6/16/08

Sun Jun 15 13:23:26 EDT 2008

Begin forwarded message:

> From: Diane Stidle <diane+ at cs.cmu.edu>
> Date: June 3, 2008 1:34:55 PM EDT
> To: ml-seminar at cs.cmu.edu, Pedro Domingos  
> <pedrod at cs.washington.edu>, Tom Mitchell <Tom.Mitchell at cs.cmu.edu>,  
> Christos <christos at cs.cmu.edu>, Geoff <ggordon at cs.cmu.edu>
> Cc: Steve <fienberg at stat.cmu.edu>
> Subject: Thesis Proposal - Ajit Singh - 6/16/08
>
> Thesis Proposal - Ajit Singh
>
> Date: June 16, 2008
> Time: 3:00pm
> Place: 3305 Newell-Simon Hall
>
> Title: Efficient Models for Relational Learning
>
> Abstract:
> The primary difference between propositional (attribute-value) and  
> relational data is the existence of relations, or links, between  
> entities.  Graphs, relational databases, sets of tensors, and first- 
> order knowledge bases are all examples of relational encodings.  
> Because of the relations between entities, standard statistical  
> assumptions, such as independence of entities, is violated.   
> Moreover, these correlations should not be ignored as they provide  
> a source of
> information that can significantly improve the accuracy of common  
> machine learning tasks (e.g., prediction, clustering) over  
> propositional alternatives.  A current limitation in relational  
> models is that learning and inference are often substantially more  
> expensive than propositional alternatives.  One of our objectives  
> is the development of models that account for uncertainty in  
> relational data while scaling to very large data sets, which often  
> cannot fit in main
> memory.  To that end, we propose representing relational data as a  
> set of tensors, one per relation, whose dimensions index different  
> entity types in the data set.  Each tensor has a low-dimensional  
> approximation, where they share a low-dimensional factor for each  
> shared entity-type.  For the case of matrices, we refer to this  
> model as collective matrix factorization.
>
> While existing techniques for relational learning assume a batch of  
> data, we propose exploring extensions to active and mixed  
> initiative learning, where the learning algorithm can query its  
> environment (typically a human user) about relationships between  
> entities, the creation of new predicates, and relationships between  
> predicates themselves.  It is our belief that the expressiveness of  
> relational representations will allow for more efficient  
> interaction between the
> learner and its environment, as well as leading to better  
> predictive models for relational data.  Efficiency refers not only  
> to computational efficiency, but also to the efficiency of data  
> collection in active learning scenarios.  To support the claim that  
> our models are efficient, we propose exploring three problems:  
> predicting user's ratings of movies with side information, topic  
> models for text using fMRI images of neural activation on words,  
> and mixed initiative tagging of e-mail and other information used  
> by personal information managers---e.g., tasks from todo lists,  
> recently
> edited files, and calendar entries.
>
> The proposal document is found at: http://www.cs.cmu.edu/~ajit/pubs/ 
> proposal.pdf
>
> Thesis Committee:
> Geoffrey Gordon (Chair)
> Christos Faloutsos
> Tom Mitchell
> Pedro Domingos (Univ. of Washington)
>
>
> -- 
> *******************************************************************
> Diane Stidle
> Business & Graduate Programs Manager
> Machine Learning Department
> School of Computer Science
> 4612 Wean Hall			
> Carnegie Mellon University		
> 5000 Forbes Avenue		
> Pittsburgh, PA  15213-3891
> Phone: 412-268-1299
> Fax:   412-268-3431
> Email: diane at cs.cmu.edu	
> URL:http://www.ml.cmu.edu			
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.srv.cs.cmu.edu/mailman/private/autonlab-research/attachments/20080615/ec7fc4f1/attachment.html>