[Research] Auton Lab meetings to resume on Wednesday October 7th
Artur Dubrawski
awd at cs.cmu.edu
Mon Sep 28 10:22:56 EDT 2009
Dear Autonians,
It is time to get back to business.
Wednesday noon has won, by a land slide, the popular vote
for the lab meeting time for 2009/10.
Thank you very much for participating in the survey.
We will start the series on Wednesday next week.
The topic and the exact location will be announced later.
This Wednesday, please consider attending a thesis defense
by a former Autonian, on the topic closely related to our
ongoing area of interest (see below for details).
Thanks
Artur
-----
Date: 9/30/09
Time: 12:00pm
Place: 1507 NSH
PhD Candidate: Ajit Singh
Title: *Efficient Models for Relational Learning
*
Abstract:
Information integration deals with the setting where one has multiple
sources of data, each describing different properties of the same set of
entities. We are concerned primarily with settings where the properties
are pairwise relations between entities, and attributes of entities. We
want to predict the value of relations and attributes, but relations
between entities violate the basic statistical assumption of
exchangeable data points, or entities. Furthermore, we desire models
that scale gracefully as the number of entities and relations increase.
Matrices are the simplest form of relational data; and we begin by
distilling the literature on low-rank matrix factorization into a small
number of modelling choices. We then frame information integration as
simultaneously factoring sets of related matrices: i.e., Collective
Matrix Factorization. Each entity is described by a small number of
parameters, and if an entity is described by more than one matrix, those
parameters participate in multiple matrix factorizations. Maximum
likelihood estimation of the resulting model involves a large non-convex
optimization, which we reduce to cyclically solving convex optimizations
over small subsets of the parameters. Each convex subproblem can be
solved by Newton-Raphson, which we extend to the setting of stochastic
Newton-Raphson.
To address the limitations of maximum likelihood estimation in matrix
factorization models, we extend our approach to the hierarchical
Bayesian setting. Here, Bayesian estimation involves computing a
high-dimensional integral with no analytic form. If we resorted to
standard Metropolis-Hastings techniques, slow mixing would limit the
scalability of our approach to large sets of entities. We show how to
accelerate Metropolis-Hastings by using our efficient solution for
maximum likelihood estimation to guide the sampling process.
This thesis rests on two claims, that (i) that Collective Matrix
Factorization can effectively integrate different sources of data to
improve prediction; and, (ii) that training scales well as the number of
entities and observations increase. Two real-world data sets are
considered in experimental support of these claims: augmented
collaborative filtering and augmented brain imaging. In augmented
collaborative filtering, we show that genre information about movies can
be used to increase the predictive accuracy of user's ratings. In
augmented brain imaging, we show that word co-occurrence information can
be used to increase the predictive accuracy of a model of changes in
brain activity to word stimuli, even in regions of the brain that were
never included in the training data.
http://www.cs.cmu.edu/~ajit/pubs/ajit-thesis-submitted.pdf
<http://www.cs.cmu.edu/%7Eajit/pubs/ajit-thesis-submitted.pdf>
Thesis Committee:
Geoff Gordon (chair)
Tom Mitchell
Christos Faloutsos
Pedro Domingos (University of Washington)
More information about the Autonlab-research
mailing list