[CL+NLP Lunch] Sujith Ravi talk, Friday Oct 24, 11am (GHC 2109)
Chris Dyer
cdyer at cs.cmu.edu
Fri Oct 17 20:17:36 EDT 2014
Sujith Ravi (Google) will be on campus next Friday Oct 24 and giving a
talk in GHC 2109 at 11am. Details below.
If you would like to meet with the speaker, please indicate your
availability here:
https://docs.google.com/document/d/1JjaXfGzt4Y__B1_ORJejOWgJaeGPSo56zdnd0dPFoxU/edit?usp=sharing
Title: Large-scale Structure Prediction for Natural Language Processing
Abstract:
Natural language processing (NLP) systems have become ubiquitous for
data analysis in digital environments such as the Web and social
media. While great progress has been made in a wide range of areas,
building NLP systems from scratch still remains a daunting challenge
for many applications, especially when there is a need to target
different domains, languages or users. Current NLP systems heavily
rely on expensive human-annotated data and struggle to effectively
scale to the volume and characteristics of changing data environments,
complex modeling choices and wide range of applications. Overcoming
these challenges requires new advances in inference algorithms and
efficient approximate learning methods that reduce the computational
complexity involved in structured prediction problems.
In this talk, I will present a series of new powerful general-purpose
learning algorithms for large-scale structured prediction applicable
to a wide range of tasks in NLP, IR, speech and computer vision. This
work introduces novel algorithms for fast unsupervised and
semi-supervised learning that address current challenges and unlike
existing methods, the new approach scales to large data sizes and
dimensionality as well as complex structured models. The new
approaches fall under two major paradigms commonly used in machine
learning: “probabilistic inference” and “graph optimization”. This
talk will focus on the former---I will describe a new approach for
fitting mixtures of exponential families, which generalizes several
probabilistic models used in NLP and other areas. A major contribution
of our work is a novel sampling method that uses randomized techniques
like locality sensitive hashing to achieve high throughput in
generating proposals during sampling. This method scales very easily
to large data and model sizes achieving huge speedups of several
orders of magnitude over existing toolkits and outperform
state-of-the-art systems on a wide variety of structured prediction
tasks ranging from clustering to topic modeling to machine
translation. Moreover, we can efficiently parallelize the algorithm on
modern computing platforms to achieve even higher throughputs. In
addition, we also prove probabilistic error guarantees for the new
algorithm. These novel techniques show great promise for tackling
other complex AI problems such as deep language understanding and
building joint models of language and vision.
Bio: Sujith Ravi is a Research Scientist at Google since 2012. Prior
to that he was a Research Scientist at Yahoo! Research. He completed
his PhD at University of Southern California/Information Sciences
Institute. His main research interests span various problems and
theory related to the fields of Natural Language Processing (NLP) and
Machine Learning. He won the SIGKDD 2014 Best Research Paper Award and
a Best Paper Award nomination at ACL 2009. He is specifically
interested in large-scale unsupervised and semi-supervised methods and
their applications to structured prediction problems in NLP,
information extraction, multi-modal learning for language/vision, user
modeling in social media, graph optimization algorithms for
summarizing noisy data, computational decipherment and computational
advertising. He has published over 30 peer-reviewed papers in top-tier
conferences and journals. He was the organizer of the ICML-NAACL
symposium in 2013, Conference Workshop Co-Chair for NAACL-HLT 2013 and
serves on the PC for ACL, ICML, NIPS, NAACL, EMNLP, AAAI, KDD and
WSDM. His work has been reported in several magazines such as New
Scientist and ACM TechNews.
Homepage: http://www.sravi.org
More information about the nlp-lunch
mailing list