[CL+NLP Lunch] CL+NLP Lunch, Wei Xu, Monday November 3rd @ 12:00pm

Tue Oct 28 16:20:48 EDT 2014

Please join us for the next CL+NLP lunch at noon on Monday November 3rd,
where Wei Xu will be speaking about modeling paraphrase. Lunch will be
provided!

To arrange meetings with Wei on Monday, please contact Dallas Card
(dcard at cmu.edu).

---
ML+NLP lunch <http://www.cs.cmu.edu/~nlp-lunch/>
Monday, November 3rd at 12:00pm
GHC 6501

Speaker: Wei Xu, University of Pennsylvania

TITLE: Modeling Lexically Divergent Paraphrases in Twitter (and Shakespeare!)

ABSTRACT:
Paraphrases are alternative linguistic expressions of the same meaning.
Identifying paraphrases is fundamental to many natural language processing
tasks and has been extensively studied for the standard contemporary
English. In this talk I will present MULTIP (Multi-instance Learning
Paraphrase Model), a joint word-sentence alignment model suited to
identify paraphrases within the noisy user generated texts on Twitter. The
model infers latent word-level paraphrase anchors from only sentence level
annotations during learning. This is a major departure from previous
approaches that rely on lexical or distributional similarities over
sentence pairs. By reducing the dependence on word overlap as evidence of
paraphrase, our approach identifies more lexically divergent expressions
with equivalent meaning. For experiments, we constructed a Twitter
Paraphrase Corpus of about 19,000 sentences using a novel and efficient
crowdsourcing methodology. Our new approach improves the state-of-the-art
performance of a method that combines a latent space model with a
feature-based supervised classifier. I will also present findings on
paraphrasing between standard English and Shakespearean styles.

Joint work with Chris Callison-Burch (UPenn), Bill Dolan (MSR), Alan
Ritter (OSU), Yangfeng Ji (GaTech), Colin Cherry (NRC) and Ralph Grishman
(NYU).

Wei Xu is a postdoc in Computer and Information Science Department at
University of Pennsylvania, working with Chris Callison-Burch. Her
research focuses on paraphrases, social media and information extraction.
She received her PhD in Computer Science from New York University. She is
organizing the SemEval-2015 shared task on "Paraphrase and Semantic
Similarity in Twitter".  During her PhD, she visited University of
Washington for two years and interned at Microsoft Research, ETS and
Amazon.com.