From dyogatama at cs.cmu.edu Tue Apr 16 01:25:09 2013 From: dyogatama at cs.cmu.edu (Dani Yogatama) Date: Tue, 16 Apr 2013 01:25:09 -0400 Subject: [CL+NLP Lunch] CL+NLP Lunch Tuesday April 23 @ noon Message-ID: Hi all, We are excited to announce that Shomir Wilson will speak to the the CL+NLPLunch. Details are included below. Lunch will be provided. Thanks, Dani =================================================== * CL+NLP Lunch* (http://www.cs.cmu.edu/~nlp-lunch/) *Speaker*: Shomir Wilson, Carnegie Mellon University *Date*: Tuesday, April 23, 2013 *Time*: 12:00 noon *Venue*: GHC 4405 *Title*: A Computational Approach to Metalanguage and the Use-Mention Distinction *Abstract*: In linguistic communication it is sometimes necessary to refer to features of language, such as orthography, vocabulary, structure, pragmatics, or meaning. Metalanguage enables a speaker to select a linguistically-relevant referent over (or in addition to) other typical referents. Metalanguage is both pervasive and, paradoxically, the subject of limited attention in research on language technologies. The ability to produce and understand metalanguage is a core linguistic competency that allows humans to establish grounding, verify audience understanding, and maintain communication channels in spite of perturbations. Metalanguage encodes unusually direct and salient information about language, but simple examples thwart parsers and other common language analysis tools. Its roles in L2 language acquisition, expression of sentiment towards others' utterances, and some theories of irony have been noted as well. In this talk, I will first present on a framework for identifying and analyzing instances of metalanguage, in an effort to reconcile the many theoretical treatments of the phenomenon for empirical use. This will include a definition of mentioned language, a common form of metalanguage with many practical roles in communication. I will then describe the creation of the first tagged and delineated corpus of English metalanguage, built by applying a combination of stylistic and lexical heuristics to Wikipedia article text. Finally, I will present preliminary results from using NLP methods to automatically identify mentioned language in text. These contributions validate the feasibility of building language technologies that can exploit the salient information about language that metalanguage encodes. *Biography*: Shomir Wilson is a Postdoctoral Associate with the Mobile Commerce Lab at Carnegie Mellon University's Institute for Software Research. He is also an NSF International Research Fellow, and will spend a year at the University of Edinburgh starting this July. He received his PhD in Computer Science from the University of Maryland in 2011, and during graduate school he twice received grants from the NSF's East Asia and Pacific Summer Institutes. His most recent research is in usable privacy and security; he is also interested in how people speak about language, and what computers can learn from metalanguage and metadialogue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dyogatama at cs.cmu.edu Tue Apr 23 07:25:56 2013 From: dyogatama at cs.cmu.edu (Dani Yogatama) Date: Tue, 23 Apr 2013 07:25:56 -0400 Subject: [CL+NLP Lunch] CL+NLP Lunch Today April 23 @ noon Message-ID: *CL+NLP Lunch *(http://www.cs.cmu.edu/~nlp-lunch/) *Speaker*: Shomir Wilson, Carnegie Mellon University *Date*: Tuesday, April 23, 2013 *Time*: 12:00 noon *Venue*: GHC 4405 *Title*: A Computational Approach to Metalanguage and the Use-Mention Distinction *Abstract*: In linguistic communication it is sometimes necessary to refer to features of language, such as orthography, vocabulary, structure, pragmatics, or meaning. Metalanguage enables a speaker to select a linguistically-relevant referent over (or in addition to) other typical referents. Metalanguage is both pervasive and, paradoxically, the subject of limited attention in research on language technologies. The ability to produce and understand metalanguage is a core linguistic competency that allows humans to establish grounding, verify audience understanding, and maintain communication channels in spite of perturbations. Metalanguage encodes unusually direct and salient information about language, but simple examples thwart parsers and other common language analysis tools. Its roles in L2 language acquisition, expression of sentiment towards others' utterances, and some theories of irony have been noted as well. In this talk, I will first present on a framework for identifying and analyzing instances of metalanguage, in an effort to reconcile the many theoretical treatments of the phenomenon for empirical use. This will include a definition of mentioned language, a common form of metalanguage with many practical roles in communication. I will then describe the creation of the first tagged and delineated corpus of English metalanguage, built by applying a combination of stylistic and lexical heuristics to Wikipedia article text. Finally, I will present preliminary results from using NLP methods to automatically identify mentioned language in text. These contributions validate the feasibility of building language technologies that can exploit the salient information about language that metalanguage encodes. *Biography*: Shomir Wilson is a Postdoctoral Associate with the Mobile Commerce Lab at Carnegie Mellon University's Institute for Software Research. He is also an NSF International Research Fellow, and will spend a year at the University of Edinburgh starting this July. He received his PhD in Computer Science from the University of Maryland in 2011, and during graduate school he twice received grants from the NSF's East Asia and Pacific Summer Institutes. His most recent research is in usable privacy and security; he is also interested in how people speak about language, and what computers can learn from metalanguage and metadialogue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dyogatama at cs.cmu.edu Tue May 14 12:22:46 2013 From: dyogatama at cs.cmu.edu (Dani Yogatama) Date: Tue, 14 May 2013 12:22:46 -0400 Subject: [CL+NLP Lunch] CL+NLP Lunch May 20 @ noon Message-ID: *CL+NLP Lunch *(http://www.cs.cmu.edu/~nlp-lunch/) *Speaker*: Andr? Martins, Priberam Labs *Date*: Monday, May 20, 2013 *Time*: 12:00 noon *Venue*: GHC 4405 *Title*: Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning *Abstract*: We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. In addition, we propose a multi-task learning framework to take advantage of existing data for extractive summarization and sentence compression. Experiments in the TAC-2008 dataset yield the highest published ROUGE scores to date, with runtimes that rival those of extractive summarizers. This work was done jointly with Miguel Almeida at Priberam Labs, and will appear soon at ACL 2013. *Biography*: Andr? Martins is a research scientist at Priberam Labs, in Lisbon, Portugal. He received his dual-degree PhD in Language Technologies in 2012 from Carnegie Mellon University and Instituto Superior T?cnico. His PhD dissertation was awarded Honorable Mention in CMU's SCS Dissertation Award competition. Martins' research interests include natural language processing, machine learning, structured prediction, sparse modeling, and optimization. He received a best paper award at the ACL 2009 conference and the Portuguese IBM 2011 Scientific Prize. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dyogatama at cs.cmu.edu Wed Oct 16 14:26:23 2013 From: dyogatama at cs.cmu.edu (Dani Yogatama) Date: Wed, 16 Oct 2013 14:26:23 -0400 Subject: [CL+NLP Lunch] Jan Botha, CL+NLP Lunch Oct 24 @ noon Message-ID: *CL+NLP Lunch *(*http://www.cs.cmu.edu/~nlp-lunch/*) *Speaker*: Jan Botha, Oxford University *Date*: Thursday, October 24, 2013 *Time*: 12:00 noon *Venue*: GHC 6115 *Title*: Unsupervised learning of non+concatenative morphology *Abstract*: The popular view of words as sequences of morphemes may work for unsupervised morphological analysis of various languages, but it is overly simplistic in the face of non-concatenative phenomena such as root-templatic stem derivation in Semitic languages. I'll present a nonparametric Bayesian approach that addresses concatenative and non-concatenative morphology simultaneously. Experiments on Arabic and Hebrew show that the richer account of stem morphology improves morphological segmentation. Identification of discontiguous root morphemes is fairly accurate and could be a source of features for downstream language processing tasks. To illustrate the flexibility of the approach, I'll also sketch some untested instantiations targeting other non-concatenative processes such as circumfixing and infixing. *Biography*: Jan Botha is a fourth-year PhD student at Oxford University. As a member of the Computational Linguistics Group, his research focuses on statistical modelling of morphologically rich languages. This interest has led him on excursions into Bayesian nonparametrics and, more recently, distributed representation learning. Before moving to Oxford to take up his Rhodes scholarship, he completed an interdisciplinary Honours Bachelors degree in Physics, Maths and Computer Science at Stellenbosch University in South Africa. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdyer at cs.cmu.edu Wed Oct 16 17:31:33 2013 From: cdyer at cs.cmu.edu (Chris Dyer) Date: Wed, 16 Oct 2013 17:31:33 -0400 Subject: [CL+NLP Lunch] Jan Botha, CL+NLP Lunch Oct 24 @ noon In-Reply-To: References: Message-ID: If you are interested in meeting with Jan (who has done a lot of work on morphology, language modeling, and machine translation), sign up here: https://docs.google.com/document/d/12-wTK4nXlkL2T27TewQUgNX1kWs1rg4FiGUhIU2Ac9w/edit On Wed, Oct 16, 2013 at 2:26 PM, Dani Yogatama wrote: > CL+NLP Lunch (http://www.cs.cmu.edu/~nlp-lunch/) > Speaker: Jan Botha, Oxford University > Date: Thursday, October 24, 2013 > Time: 12:00 noon > Venue: GHC 6115 > > Title: Unsupervised learning of non+concatenative morphology > > Abstract: > The popular view of words as sequences of morphemes may work > for unsupervised morphological analysis of various languages, but it > is overly simplistic in the face of non-concatenative phenomena such > as root-templatic stem derivation in Semitic languages. I'll present a > nonparametric Bayesian approach that addresses concatenative and > non-concatenative morphology simultaneously. Experiments on Arabic and > Hebrew show that the richer account of stem morphology improves > morphological segmentation. Identification of discontiguous root > morphemes is fairly accurate and could be a source of features for > downstream language processing tasks. To illustrate the flexibility of > the approach, I'll also sketch some untested instantiations targeting > other non-concatenative processes such as circumfixing and infixing. > > Biography: > Jan Botha is a fourth-year PhD student at Oxford University. As a > member of the Computational Linguistics Group, his research focuses on > statistical modelling of morphologically rich languages. This interest > has led him on excursions into Bayesian nonparametrics and, more > recently, distributed representation learning. Before moving to Oxford > to take up his Rhodes scholarship, he completed an interdisciplinary > Honours Bachelors degree in Physics, Maths and Computer Science at > Stellenbosch University in South Africa. From dyogatama at cs.cmu.edu Thu Oct 24 00:00:29 2013 From: dyogatama at cs.cmu.edu (Dani Yogatama) Date: Thu, 24 Oct 2013 00:00:29 -0400 Subject: [CL+NLP Lunch] Jan Botha, CL+NLP Lunch Oct 24 @ noon In-Reply-To: References: Message-ID: reminder: today at noon. lunch will be provided. On Wed, Oct 16, 2013 at 2:26 PM, Dani Yogatama wrote: > *CL+NLP Lunch *(*http://www.cs.cmu.edu/~nlp-lunch/*) > *Speaker*: Jan Botha, Oxford University > *Date*: Thursday, October 24, 2013 > *Time*: 12:00 noon > *Venue*: GHC 6115 > > *Title*: Unsupervised learning of non+concatenative morphology > > *Abstract*: > The popular view of words as sequences of morphemes may work > for unsupervised morphological analysis of various languages, but it > is overly simplistic in the face of non-concatenative phenomena such > as root-templatic stem derivation in Semitic languages. I'll present a > nonparametric Bayesian approach that addresses concatenative and > non-concatenative morphology simultaneously. Experiments on Arabic and > Hebrew show that the richer account of stem morphology improves > morphological segmentation. Identification of discontiguous root > morphemes is fairly accurate and could be a source of features for > downstream language processing tasks. To illustrate the flexibility of > the approach, I'll also sketch some untested instantiations targeting > other non-concatenative processes such as circumfixing and infixing. > > *Biography*: > Jan Botha is a fourth-year PhD student at Oxford University. As a > member of the Computational Linguistics Group, his research focuses on > statistical modelling of morphologically rich languages. This interest > has led him on excursions into Bayesian nonparametrics and, more > recently, distributed representation learning. Before moving to Oxford > to take up his Rhodes scholarship, he completed an interdisciplinary > Honours Bachelors degree in Physics, Maths and Computer Science at > Stellenbosch University in South Africa. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dyogatama at cs.cmu.edu Wed Nov 13 16:17:24 2013 From: dyogatama at cs.cmu.edu (Dani Yogatama) Date: Wed, 13 Nov 2013 16:17:24 -0500 Subject: [CL+NLP Lunch] CL+NLP Lunch November 19 @ 12.30pm: Vinodkumar Prabhakaran In-Reply-To: References: Message-ID: Sign up here if you are interested in meeting with Vinod https://docs.google.com/document/d/1clcCzA_YT42g9U_wlzcf-e2oXVJ4Wcke_kmtdqFL3SM/edit On Wed, Nov 13, 2013 at 4:14 PM, Dani Yogatama wrote: > *CL+NLP Lunch *(*http://www.cs.cmu.edu/~nlp-lunch/ > *) > *Speaker*: Vinodkumar Prabhakaran, Columbia University > *Date*: Tuesday, November 19, 2013 > *Time*: 12:30pm *(note: 12.30pm not at noon)* > *Venue*: GHC 4405 > > *Title:* Manifestations of Social Power in Interactions > > *Abstract:* In this talk, I will present the study on how social power relations > affect the way people interact with one another in both > online and offline settings and how we can use statistical machine > learning techniques to detect these power relations automatically. The > talk will cover studies on two different domains ? A) Enron email > corpus (written, task-oriented) and B) 2012 Republican Presidential > Primary debates (spoken, persuasive). In part A, we explored four > different types of power ? influence, hierarchical power, situational > power, and power over communication ? within an organizational > setting. We found that these four types of power i) manifest in the > structure of the dialog, ii) are different in the ways they manifest > in dialog and iii) can be predicted using automatic means with > reasonable performance. In part B, the presidential primary debates, > we modeled power based on the candidates? relative position in the > recent polls released prior to each debate. We found that a > candidate?s power affects the way they interact with others in the > debate as well as the way others interact with them. I will also > present an automatic power ranker system to rank candidates in terms > of their relative power based on linguistic and structural features. > > *Bio:* Vinodkumar Prabhakaran is a 4th year PhD student working under > the supervision of Dr. Owen Rambow at the Center for Computational > Learning Systems (CCLS). His research focuses on statistical machine > learning techniques for NLP and spans across different areas within > NLP such as computational sociolinguistics, semantic analysis and > biomedical information extraction. His thesis focuses on analyzing > social interactions to detect social power relations between > interactants, across various type of power, genres and domains. His > work has been published at various international venues such as WWW, > NAACL, COLING, IJCNLP and ECAI. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dyogatama at cs.cmu.edu Wed Nov 13 16:14:48 2013 From: dyogatama at cs.cmu.edu (Dani Yogatama) Date: Wed, 13 Nov 2013 16:14:48 -0500 Subject: [CL+NLP Lunch] CL+NLP Lunch November 19 @ 12.30pm: Vinodkumar Prabhakaran Message-ID: *CL+NLP Lunch *(*http://www.cs.cmu.edu/~nlp-lunch/ *) *Speaker*: Vinodkumar Prabhakaran, Columbia University *Date*: Tuesday, November 19, 2013 *Time*: 12:30pm *(note: 12.30pm not at noon)* *Venue*: GHC 4405 *Title:* Manifestations of Social Power in Interactions *Abstract:* In this talk, I will present the study on how social power relations affect the way people interact with one another in both online and offline settings and how we can use statistical machine learning techniques to detect these power relations automatically. The talk will cover studies on two different domains ? A) Enron email corpus (written, task-oriented) and B) 2012 Republican Presidential Primary debates (spoken, persuasive). In part A, we explored four different types of power ? influence, hierarchical power, situational power, and power over communication ? within an organizational setting. We found that these four types of power i) manifest in the structure of the dialog, ii) are different in the ways they manifest in dialog and iii) can be predicted using automatic means with reasonable performance. In part B, the presidential primary debates, we modeled power based on the candidates? relative position in the recent polls released prior to each debate. We found that a candidate?s power affects the way they interact with others in the debate as well as the way others interact with them. I will also present an automatic power ranker system to rank candidates in terms of their relative power based on linguistic and structural features. *Bio:* Vinodkumar Prabhakaran is a 4th year PhD student working under the supervision of Dr. Owen Rambow at the Center for Computational Learning Systems (CCLS). His research focuses on statistical machine learning techniques for NLP and spans across different areas within NLP such as computational sociolinguistics, semantic analysis and biomedical information extraction. His thesis focuses on analyzing social interactions to detect social power relations between interactants, across various type of power, genres and domains. His work has been published at various international venues such as WWW, NAACL, COLING, IJCNLP and ECAI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dyogatama at cs.cmu.edu Tue Nov 19 11:58:26 2013 From: dyogatama at cs.cmu.edu (Dani Yogatama) Date: Tue, 19 Nov 2013 11:58:26 -0500 Subject: [CL+NLP Lunch] CL+NLP Lunch November 19 @ 12.30pm: Vinodkumar Prabhakaran In-Reply-To: References: Message-ID: reminder: in 30 minutes On Wed, Nov 13, 2013 at 4:14 PM, Dani Yogatama wrote: > *CL+NLP Lunch *(*http://www.cs.cmu.edu/~nlp-lunch/ > *) > *Speaker*: Vinodkumar Prabhakaran, Columbia University > *Date*: Tuesday, November 19, 2013 > *Time*: 12:30pm *(note: 12.30pm not at noon)* > *Venue*: GHC 4405 > > *Title:* Manifestations of Social Power in Interactions > > *Abstract:* In this talk, I will present the study on how social power relations > affect the way people interact with one another in both > online and offline settings and how we can use statistical machine > learning techniques to detect these power relations automatically. The > talk will cover studies on two different domains ? A) Enron email > corpus (written, task-oriented) and B) 2012 Republican Presidential > Primary debates (spoken, persuasive). In part A, we explored four > different types of power ? influence, hierarchical power, situational > power, and power over communication ? within an organizational > setting. We found that these four types of power i) manifest in the > structure of the dialog, ii) are different in the ways they manifest > in dialog and iii) can be predicted using automatic means with > reasonable performance. In part B, the presidential primary debates, > we modeled power based on the candidates? relative position in the > recent polls released prior to each debate. We found that a > candidate?s power affects the way they interact with others in the > debate as well as the way others interact with them. I will also > present an automatic power ranker system to rank candidates in terms > of their relative power based on linguistic and structural features. > > *Bio:* Vinodkumar Prabhakaran is a 4th year PhD student working under > the supervision of Dr. Owen Rambow at the Center for Computational > Learning Systems (CCLS). His research focuses on statistical machine > learning techniques for NLP and spans across different areas within > NLP such as computational sociolinguistics, semantic analysis and > biomedical information extraction. His thesis focuses on analyzing > social interactions to detect social power relations between > interactants, across various type of power, genres and domains. His > work has been published at various international venues such as WWW, > NAACL, COLING, IJCNLP and ECAI. > -------------- next part -------------- An HTML attachment was scrubbed... URL: