From sabhnani+ at cs.cmu.edu Tue Feb 1 11:26:16 2011 From: sabhnani+ at cs.cmu.edu (Robin Sabhnani) Date: Tue, 01 Feb 2011 11:26:16 -0500 Subject: [Research] [Fwd: new $3M prize for great machine learning...] Message-ID: <4D483428.4000009@cs.cmu.edu> FYI -------------- next part -------------- An embedded message was scrubbed... From: Tom Mitchell Subject: new $3M prize for great machine learning... Date: Mon, 31 Jan 2011 18:03:41 -0500 Size: 6515 URL: From sabhnani at cmu.edu Mon Feb 14 08:45:21 2011 From: sabhnani at cmu.edu (Mahesh Sabhnani) Date: Mon, 14 Feb 2011 08:45:21 -0500 Subject: [Research] [Fwd: [ML Lunch] Monday, 2/14, noon, **NSH 3305**: Anima Anandkumar on "High-Dimensional Structure Learning of Graphical Models: Trees, Latent trees and Beyond"] Message-ID: <524f8523f08ab08f53fdef147b93ea91.squirrel@webmail.andrew.cmu.edu> This might be of interest to some people in the lab. ---------------------------- Original Message ---------------------------- Subject: [ML Lunch] Monday, 2/14, noon, **NSH 3305**: Anima Anandkumar on "High-Dimensional Structure Learning of Graphical Models: Trees, Latent trees and Beyond" From: "Sue Ann Hong" Date: Mon, February 14, 2011 12:18 am To: learning-lunch-seminar at cs.cmu.edu -------------------------------------------------------------------------- Machine Learning Lunch (http://www.cs.cmu.edu/~learning/) Speaker: Anima Anandkumar Venue: NSH 3305 Date: Monday, Feb. 14, 2011 Time: 12:00 noon Food will be provided. Title: "High-Dimensional Structure Learning of Graphical Models: Trees, Latent trees and Beyond" Abstract: Graphical models or Markov random fields provide a graph-based framework for capturing dependencies between random variables of a large-scale multivariate distribution. This interdisciplinary topic has found widespread application in a variety of areas including image processing, bioinformatics, combinatorial optimization and machine learning. Estimating the graph structure of the model using samples drawn from it forms an important task, since the structure reveals important relationships between the variables. However, structure estimation has several challenges: in general graphical models, it is NP-hard, the models are typically in the high-dimensional regime where the number of variables is much larger than the number of samples obtained, and there could be many latent variables which are unobserved. I will address these challenges in the talk and provide solutions for certain classes of models. I will focus on latent tree models in the first part of the talk. These are tree graphical models where there are latent variables, but there is no knowledge of the number or the location of the latent variables. We have developed novel algorithms which are consistent, computationally efficient and have low sample complexity. These algorithms are based on the presence of an additive metric on the tree, due to the properties of correlations on a tree model. The first algorithm uses these properties to check for sibling relationships between node pairs and builds the tree in a recursive fashion. The second algorithm initially builds a tree over the observed variables, and then adds hidden nodes in a step-by-step fashion by only operating on small subsets of variables. This leads to considerable computational savings compared to the first algorithm. We modify the second algorithm for experiments on real data by trading off number of added latent variables with the accuracy of resulting model fitting via the Bayesian Information Criterion (BIC). Experiments on the S&P 100 monthly returns data and on the occurrence of words in newsgroups reveal interesting relationships. In the second part, I will talk about recent results on learning graphical models on sparse Erdos-Renyi random graphs. These random graphs are relevant in social networks. Since these graphs are locally tree-like, it is a natural question if structure learning is feasible in these models, given that learning tree models is tractable. We provide a positive answer when the model is in the so-called uniqueness regime, where there is a decay of long-range correlations. The algorithm is based on a set of conditional mutual information tests and is shown to be consistent for structure estimation with almost order-optimal sample complexity. A simpler algorithm based on correlation thresholding is also consistent, but under more stringent conditions. Finally, depending on the time availability, I will briefly mention related works on consistent estimation of high-dimensional forest distributions and the characterization of extremal tree structures with respect to error rates for structure learning. About the speaker: Anima Anandkumar received her B.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Madras in 2004 and her MS and PhD degrees in Electrical Engineering from Cornell University, Ithaca, NY in 2009. She was at the Stochastic Systems Group at MIT, Cambridge, MA as a post-doctoral researcher. She has been an assistant professor at EECS Dept. at U.C.Irvine since July 2010. She is the recipient of the 2009 Best Thesis Award by the ACM Sigmetrics Society, 2008 IEEE Signal Processing Society Young Author Best Paper Award, 2008 IBM Fran Allen PhD fellowship, and student paper award at 2006 IEEE ICASSP. Her research interests are in the area of statistical-signal processing, network theory and information theory. -------------- next part -------------- An HTML attachment was scrubbed... URL: