[Research] Auton Lab meetings to resume on Wednesday October 7th

Mon Sep 28 10:22:56 EDT 2009

Dear Autonians,

It is time to get back to business.

Wednesday noon has won, by a land slide, the popular vote
for the lab meeting time for 2009/10.
Thank you very much for participating in the survey.

We will start the series on Wednesday next week.
The topic and the exact location will be announced later.

This Wednesday, please consider attending a thesis defense
by a former Autonian, on the topic closely related to our
ongoing area of interest (see below for details).

Thanks
Artur

-----
Date: 9/30/09
Time: 12:00pm
Place: 1507 NSH

PhD Candidate: Ajit Singh

Title: *Efficient Models for Relational Learning
*
Abstract:
Information integration deals with the setting where one has multiple 
sources of data, each describing different properties of the same set of 
entities. We are concerned primarily with settings where the properties 
are pairwise relations between entities, and attributes of entities. We 
want to predict the value of relations and attributes, but relations 
between entities violate the basic statistical assumption of 
exchangeable data points, or entities. Furthermore, we desire models 
that scale gracefully as the number of entities and relations increase.

Matrices are the simplest form of relational data; and we begin by 
distilling the literature on low-rank matrix factorization into a small 
number of modelling choices. We then frame information integration as 
simultaneously factoring sets of related matrices: i.e., Collective 
Matrix Factorization. Each entity is described by a small number of 
parameters, and if an entity is described by more than one matrix, those 
parameters participate in multiple matrix factorizations. Maximum 
likelihood estimation of the resulting model involves a large non-convex 
optimization, which we reduce to cyclically solving convex optimizations 
over small subsets of the parameters. Each convex subproblem can be 
solved by Newton-Raphson, which we extend to the setting of stochastic 
Newton-Raphson.

To address the limitations of maximum likelihood estimation in matrix 
factorization models, we extend our approach to the hierarchical 
Bayesian setting. Here, Bayesian estimation involves computing a 
high-dimensional integral with no analytic form. If we resorted to 
standard Metropolis-Hastings techniques, slow mixing would limit the 
scalability of our approach to large sets of entities. We show how to 
accelerate Metropolis-Hastings by using our efficient solution for 
maximum likelihood estimation to guide the sampling process.

This thesis rests on two claims, that (i) that Collective Matrix 
Factorization can effectively integrate different sources of data to 
improve prediction; and, (ii) that training scales well as the number of 
entities and observations increase. Two real-world data sets are 
considered in experimental support of these claims: augmented 
collaborative filtering and augmented brain imaging. In augmented 
collaborative filtering, we show that genre information about movies can 
be used to increase the predictive accuracy of user's ratings. In 
augmented brain imaging, we show that word co-occurrence information can 
be used to increase the predictive accuracy of a model of changes in 
brain activity to word stimuli, even in regions of the brain that were 
never included in the training data.

http://www.cs.cmu.edu/~ajit/pubs/ajit-thesis-submitted.pdf 
<http://www.cs.cmu.edu/%7Eajit/pubs/ajit-thesis-submitted.pdf>

Thesis Committee:
Geoff Gordon (chair)
Tom Mitchell
Christos Faloutsos
Pedro Domingos (University of Washington)