From nathan at cmu.edu Thu Jan 6 18:11:09 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Thu, 6 Jan 2011 18:11:09 -0500 Subject: [CL+NLP Lunch] Jan. 18 & Linguistics Reading Group Message-ID: All, A couple of quick announcements as we gear up for the new year: First, a heads-up to mark your calendar for the next CL+NLP Lunch on Tuesday, Jan. 18 (noon, GHC 6115). Chris Dyer will discuss morphological modeling for machine translation. A more detailed announcement will be sent out next week. At this point we expect subsequent lunches this semester to take place on Tuesdays as well. Second, we wanted to issue a reminder about the Linguistics Reading Group. This group meets weekly for informal discussion of linguistics topics and papers of interest to attendees. Please email me if you wish to join the reading group email list. Of particular interest, the reading group's meeting this coming Monday will feature a special guest speaker: Stephanie Shih (from Stanford Linguistics) will present on the role of rhythm in the English genitive alternation (e.g. what motivates speakers to say "the car?s wheel" vs. "the wheel of the car"). Cheers, Nathan & Ben From nathan at cmu.edu Thu Jan 13 17:32:49 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Thu, 13 Jan 2011 17:32:49 -0500 Subject: [CL+NLP Lunch] Tuesday: Chris Dyer, morphology in machine translation Message-ID: Everyone, On Tuesday, Chris Dyer of LTI will speak to the CL+NLP Lunch. Details are below. (He plans to finish by 1:00.) Ben & Nathan Tuesday, Jan. 18 @ noon GHC 6115 Chris Dyer Postdoctoral Fellow, LTI Inflectional Morphology in Probabilistic Translation Models Abstract: In conventional translation models, words that differ from each other in any way are modeled independently of each other. From a modeling perspective, this is unsatisfying since closely related morphological forms of an underlying stem are likely to share many characteristics that are important for translation. And, more practically, this independence assumption means data sparsity is a significant issue in translation between morphologically complex languages. I compare two new probabilistic translation models that relax this "lexical independence assumption" and share statistics across morphologically related word forms. The first model is generative, based on hierarchical Pitman-Yor processes, in which the translation distributions for different inflection variants of a stem share a common base distribution. The second model is based on Markov random fields and uses morphological features to share information across related forms. Lunch will be provided. From benlambert at cmu.edu Mon Jan 17 21:46:16 2011 From: benlambert at cmu.edu (Benjamin Lambert) Date: Mon, 17 Jan 2011 21:46:16 -0500 Subject: [CL+NLP Lunch] CL/NLP lunch: Chris Dyer, morphology in machine translation, TUESDAY@ noon Message-ID: <08E82269-22E2-4BF1-B50E-70BF66882F0D@cmu.edu> Hi everyone, This is a friendly reminder that LTI postdoc Chris Dyer will be speaking on morphology in machine translation tomorrow (Tuesday!) at noon. We plan to be finished before 1pm, so folks can also attend Greg's thesis proposal at 1pm. Ben & Nathan Tuesday, Jan. 18 @ noon GHC 6115 Chris Dyer Postdoctoral Fellow, LTI Inflectional Morphology in Probabilistic Translation Models Abstract: In conventional translation models, words that differ from each other in any way are modeled independently of each other. From a modeling perspective, this is unsatisfying since closely related morphological forms of an underlying stem are likely to share many characteristics that are important for translation. And, more practically, this independence assumption means data sparsity is a significant issue in translation between morphologically complex languages. I compare two new probabilistic translation models that relax this "lexical independence assumption" and share statistics across morphologically related word forms. The first model is generative, based on hierarchical Pitman-Yor processes, in which the translation distributions for different inflection variants of a stem share a common base distribution. The second model is based on Markov random fields and uses morphological features to share information across related forms. Lunch will be provided. From nathan at cmu.edu Wed Feb 2 17:15:33 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Wed, 2 Feb 2011 17:15:33 -0500 Subject: [CL+NLP Lunch] Tuesday 2/8: Na-Rae Han on ESL Error Correction Models Message-ID: We are pleased to announce February's CL+NLP lunch: *Tuesday, Feb. 8 @ noon* GHC 4405 Lunch will be provided. *Na-Rae Han* Lecturer, Linguistics; Director, Robert Henderson Language Media Center, University of Pittsburgh *Building ESL (English as a Second Language) Error Correction Models* *Abstract:* For many ESL (English as a Second Language) and EFL (English as a Foreign Language) students, interacting with computerized applications is an integral part of their learning experience. NLP-based language models can be a valuable tool in assisting teachers and students alike by providing prompt feedback on certain aspects of language. Recent research in this area has been towards the development grammar correction applications specifically targeting learners of English. In this talk, I will first present work on determiner correction using a statistical model trained on well-formed texts written by native English speakers. However, such an approach is limited in that constructing a large enough error-annotated corpus to support a statistical approach is time-consuming and labor intensive. I will therefore describe a newer study focusing on preposition errors in which error correction models are trained exclusively on an error-annotated corpus produced by ESL learners. We address the design issues and the logistical problems that arise from the partially annotated nature of our data set. *Bio:* Na-Rae Han is currently Lecturer in the Linguistics department and Director of Robert Henderson Language Media Center, which promotes use of technology in language instruction, at the University of Pittsburgh. She received her Ph.D. in Linguistics from the University of Pennsylvania in 2006 after completing her M.S.E. in Computer and Information Science there. She has previously worked as a researcher in the Automated Scoring and Natural Language Processing Group of Educational Testing Service (ETS) in Princeton, NJ, and as a research professor at Korea University in Seoul, Korea. See you on Tuesday, Nathan & Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20110202/744dd55c/attachment.html From nathan at cmu.edu Mon Feb 7 21:20:24 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Mon, 7 Feb 2011 21:20:24 -0500 Subject: [CL+NLP Lunch] Tuesday 2/8: Na-Rae Han on ESL Error Correction Models In-Reply-To: References: Message-ID: A reminder that Na-Rae Han will be speaking tomorrow at noon in GHC 4405. Hope to see you there! Nathan & Ben On Wed, Feb 2, 2011 at 5:15 PM, Nathan Schneider wrote: > We are pleased to announce February's CL+NLP lunch: > > Tuesday, Feb. 8 @ noon > GHC 4405 > Lunch will be provided. > > Na-Rae Han > Lecturer, Linguistics; Director, Robert Henderson Language Media Center, > University of Pittsburgh > > Building ESL (English as a Second Language) Error Correction Models > > Abstract: > For many ESL (English as a Second Language) and EFL (English as a Foreign > Language) students, interacting with computerized applications is an > integral part of their learning experience. NLP-based language models can be > a valuable tool in assisting teachers and students alike by providing prompt > feedback on certain aspects of language. Recent research in this area has > been towards the development grammar correction applications specifically > targeting learners of English. > > In this talk, I will first present work on determiner correction using a > statistical model trained on well-formed texts written by native English > speakers. However, such an approach is limited in that constructing a large > enough error-annotated corpus to support a statistical approach is > time-consuming and labor intensive. I will therefore describe a newer study > focusing on preposition errors in which error correction models are trained > exclusively on an error-annotated corpus produced by ESL learners. We > address the design issues and the logistical problems that arise from the > partially annotated nature of our data set. > > Bio: > Na-Rae Han is currently Lecturer in the Linguistics department and Director > of Robert Henderson Language Media Center, which promotes use of technology > in language instruction, at the University of Pittsburgh. She received her > Ph.D. in Linguistics from the University of Pennsylvania in 2006 after > completing her M.S.E. in Computer and Information Science there. She has > previously worked as a researcher in the Automated Scoring and Natural > Language Processing Group of Educational Testing Service (ETS) in Princeton, > NJ, and as a research professor at Korea University in Seoul, Korea. > > > See you on Tuesday, > Nathan & Ben > From nathan at cmu.edu Tue Mar 8 15:23:18 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Tue, 8 Mar 2011 15:23:18 -0500 Subject: [CL+NLP Lunch] next Tuesday at 12:30: Charles Kemp on Cross-Cultural Lexical Semantics Message-ID: All, We are excited to announce this month's CL+NLP Lunch: *Tuesday, March 15, 12:30pm* NSH 3305 Food will be provided. *Charles Kemp* Professor, CMU Psychology *Cross-Cultural Lexical Semantics: The Case of Kinship* *Abstract:* Given any medium-sized set of items, the number of ways to organize these items into categories is vast. What are the principles that explain which categories humans choose to name? I will present a computational approach to lexical semantics and will apply it to the domain of kinship. The account is based on two fundamental principles: good systems of categories are simple, and they enable informative communication. I will show that kinship terminologies in the world's languages achieve a near-optimal tradeoff between these competing principles. I will also suggest that this general-purpose account helps to explain several specific constraints on kin classification proposed by previous researchers. (joint work with Terry Regier) *Bio:* Charles Kemp is based in the psychology department and works on probabilistic models of human learning and reasoning. Among other topics he is interested in the extent to which human language qualifies as a near-optimal communication system. Cheers, Nathan & Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20110308/f83ff459/attachment.html From nathan at cmu.edu Mon Mar 14 20:30:36 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Mon, 14 Mar 2011 20:30:36 -0400 Subject: [CL+NLP Lunch] next Tuesday at 12:30: Charles Kemp on Cross-Cultural Lexical Semantics In-Reply-To: References: Message-ID: A reminder that Charles Kemp's talk is tomorrow at 12:30. See you there! Nathan & Ben On Tue, Mar 8, 2011 at 3:23 PM, Nathan Schneider wrote: > All, > > We are excited to announce this month's CL+NLP Lunch: > > *Tuesday, March 15, 12:30pm* > NSH 3305 > Food will be provided. > > *Charles Kemp* > Professor, CMU Psychology > > *Cross-Cultural Lexical Semantics: The Case of Kinship* > > *Abstract:* > Given any medium-sized set of items, the number of ways to organize these > items into categories is vast. What are the principles that explain which > categories humans choose to name? I will present a computational approach to > lexical semantics and will apply it to the domain of kinship. The account is > based on two fundamental principles: good systems of categories are simple, > and they enable informative communication. I will show that kinship > terminologies in the world's languages achieve a near-optimal tradeoff > between these competing principles. I will also suggest that this > general-purpose account helps to explain several specific constraints on kin > classification proposed by previous researchers. > > (joint work with Terry Regier) > > *Bio:* > Charles Kemp is based in the psychology department and works on > probabilistic models of human learning and reasoning. Among other topics he > is interested in the extent to which human language qualifies as a > near-optimal communication system. > > > Cheers, > Nathan & Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20110314/1f61875f/attachment.html From nathan at cmu.edu Mon Apr 4 21:48:25 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Mon, 4 Apr 2011 21:48:25 -0400 Subject: [CL+NLP Lunch] Next Monday, 3:30: Yoav Goldberg on syntactic parsing for Hebrew Message-ID: All, This month's CL+NLP seminar will feature Yoav Goldberg, and will take place at a special time: *Monday, April 11, 3:30pm NSH 3305 * Dealing with the Complexities of Syntactic Parsing in Hebrew: Addressing agreement, word-segmentation and rich morphology in a fast dependency parser Abstract: I will describe my experience with designing a syntactic parser for Hebrew, a language with rich morphology and a small treebank. After describing some of the characteristics that make automatic syntactic processing of Hebrew challenging and discuss some data representation issues, I will present some solutions to these challenges. These include improvements of a semi-supervised broad-coverage tagger, and a greedy dependency parser which can accommodate rich feature-sets and cope with noisy data while remaining fast. I will also briefly discuss a constituency parsing system that performs joint morphological segmentation and syntactic parsing. The work on Hebrew brought about solutions that work well also for English, I will point to these results when appropriate. Bio: Yoav Goldberg is finishing up his PhD in Ben-Gurion University in Israel under the supervision of Prof. Michael Elhadad. His PhD work revolves around natural language processing in Hebrew and morphologically rich languages, syntactic parsing, and doing fun stuff with computers and language. Prior to his PhD he participated in the Israeli Hi-Tech industry, mostly as a freelance security consultant. His research interests include structured prediction, computational creativity and confidence estimation. Host: Shay Cohen Snacks will be served. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20110404/1d3cf124/attachment.html From nathan at cmu.edu Sun Apr 10 23:11:42 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Sun, 10 Apr 2011 23:11:42 -0400 Subject: [CL+NLP Lunch] Next Monday, 3:30: Yoav Goldberg on syntactic parsing for Hebrew In-Reply-To: References: Message-ID: Reminder: Yoav's talk is tomorrow. See you at 3:30! Nathan & Ben On Mon, Apr 4, 2011 at 9:48 PM, Nathan Schneider wrote: > All, > > This month's CL+NLP seminar will feature Yoav Goldberg, and will take place > at a special time: > > *Monday, April 11, 3:30pm > NSH 3305 > * > Dealing with the Complexities of Syntactic Parsing in Hebrew: Addressing > agreement, word-segmentation and rich morphology in a fast dependency parser > > Abstract: > I will describe my experience with designing a syntactic parser for Hebrew, > a language with rich morphology and a small treebank. After describing some > of the characteristics that make automatic syntactic processing of Hebrew > challenging and discuss some data representation issues, I will present some > solutions to these challenges. These include improvements of a > semi-supervised broad-coverage tagger, and a greedy dependency parser which > can accommodate rich feature-sets and cope with noisy data while remaining > fast. I will also briefly discuss a constituency parsing system that > performs joint morphological segmentation and syntactic parsing. The work on > Hebrew brought about solutions that work well also for English, I will point > to these results when appropriate. > > Bio: > Yoav Goldberg is finishing up his PhD in Ben-Gurion University in Israel > under the supervision of Prof. Michael Elhadad. His PhD work revolves > around natural language processing in Hebrew and morphologically rich > languages, syntactic parsing, and doing fun stuff with computers and > language. Prior to his PhD he participated in the Israeli Hi-Tech industry, > mostly as a freelance security consultant. His research interests include > structured prediction, computational creativity and confidence estimation. > > Host: Shay Cohen > > Snacks will be served. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20110410/b284cbc3/attachment.html From nathan at cmu.edu Thu Nov 3 16:04:09 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Thu, 3 Nov 2011 16:04:09 -0400 Subject: [CL+NLP Lunch] Nov. 15 @ noon: Diane Litman, "Automatically Predicting Peer-Review Helpfulness" / Linguistics Reading Group Message-ID: All, I am happy to announce that Prof. Diane Litman will be at CMU a week from Tuesday to speak about research in an educational application of natural language processing. Details are included below. Also, the LTI Linguistics Reading Group invites new members: it is an informal forum for discussing research in various linguistics-related topics. The group typically meets once a week with a rotating set of topic areas (cognition/learning; sociolinguistics/discourse; descriptive linguistics). If any of these are of interest to you, sign up for the mailing list: https://mailman.srv.cs.cmu.edu/mailman/listinfo/lti-cogling Thanks, Ben & Nathan CL+NLP Lunch *Tuesday, Nov. 15 @ noon in GHC 4405* Lunch will be provided. *Diane Litman* Professor, Computer Science Senior Scientist, Learning Research and Development Center University of Pittsburgh *Automatically Predicting Peer-Review Helpfulness* One path to improving the quality of student writing has involved the use of a peer review process, often supported by web-based technology. The long-term goal of our research is to use Natural Language Processing to address three core problems in peer-review of writing: reviews are often stated in ineffective ways, reviews and papers do not focus on important paper aspects, and authors do not have a process for organizing paper revisions. This talk will present our research on automatically predicting the helpfulness of peer reviews, one important task for improving the quality of feedback received by students, as well as for helping students write better reviews. We first examine whether standard product review analysis techniques also apply to our new context of peer reviews. We also investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that structural features, review unigrams and meta-data are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction. Finally, we investigate how different types of perceived helpfulness might influence the utility of features for automatic prediction. Our feature selection results show that certain low-level linguistic features are more useful for predicting student perceived helpfulness, while high-level cognitive constructs are more effective in modeling experts' perceived helpfulness. This work is done in collaboration with Wenting Xiong, Christian Schunn, and Kevin Ashley, University of Pittsburgh. Bio: Diane Litman is Professor of Computer Science, Senior Scientist with the Learning Research and Development Center, and faculty with the Graduate Program in Intelligent Systems, all at the University of Pittsburgh. She has been working in the field of artificial intelligence since she received her Ph.D. degree in Computer Science from the University of Rochester. Before joining Pitt, she was a member of the Artificial Intelligence Principles Research Department, AT&T Labs - Research (formerly Bell Laboratories). Dr. Litman's current research focuses on enhancing the effectiveness of educational technology through the use of spoken and natural language processing, affective computing, and machine learning and other statistical methods. Dr. Litman has been Chair of the North American Chapter of the Association for Computational Linguistics, has co-authored multiple papers winning best paper awards, and has been awarded Senior Member status by the Association for the Advancement of Artificial Intelligence. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20111103/f60702f6/attachment.html From nathan at cmu.edu Thu Nov 10 14:02:06 2011 From: nathan at cmu.edu (Nathan Schneider) Date: Thu, 10 Nov 2011 14:02:06 -0500 Subject: [CL+NLP Lunch] Nov. 15 @ noon: Diane Litman, "Automatically Predicting Peer-Review Helpfulness" Message-ID: Reminder: this coming Tuesday! CL+NLP Lunch *Tuesday, Nov. 15 @ noon in GHC 4405* Lunch will be provided. *Diane Litman* Professor, Computer Science Senior Scientist, Learning Research and Development Center University of Pittsburgh *Automatically Predicting Peer-Review Helpfulness* One path to improving the quality of student writing has involved the use of a peer review process, often supported by web-based technology. The long-term goal of our research is to use Natural Language Processing to address three core problems in peer-review of writing: reviews are often stated in ineffective ways, reviews and papers do not focus on important paper aspects, and authors do not have a process for organizing paper revisions. This talk will present our research on automatically predicting the helpfulness of peer reviews, one important task for improving the quality of feedback received by students, as well as for helping students write better reviews. We first examine whether standard product review analysis techniques also apply to our new context of peer reviews. We also investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that structural features, review unigrams and meta-data are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction. Finally, we investigate how different types of perceived helpfulness might influence the utility of features for automatic prediction. Our feature selection results show that certain low-level linguistic features are more useful for predicting student perceived helpfulness, while high-level cognitive constructs are more effective in modeling experts' perceived helpfulness. This work is done in collaboration with Wenting Xiong, Christian Schunn, and Kevin Ashley, University of Pittsburgh. Bio: Diane Litman is Professor of Computer Science, Senior Scientist with the Learning Research and Development Center, and faculty with the Graduate Program in Intelligent Systems, all at the University of Pittsburgh. She has been working in the field of artificial intelligence since she received her Ph.D. degree in Computer Science from the University of Rochester. Before joining Pitt, she was a member of the Artificial Intelligence Principles Research Department, AT&T Labs - Research (formerly Bell Laboratories). Dr. Litman's current research focuses on enhancing the effectiveness of educational technology through the use of spoken and natural language processing, affective computing, and machine learning and other statistical methods. Dr. Litman has been Chair of the North American Chapter of the Association for Computational Linguistics, has co-authored multiple papers winning best paper awards, and has been awarded Senior Member status by the Association for the Advancement of Artificial Intelligence. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20111110/988f9880/attachment.html From benlambert at cmu.edu Tue Nov 15 08:01:10 2011 From: benlambert at cmu.edu (Benjamin Lambert) Date: Tue, 15 Nov 2011 08:01:10 -0500 Subject: [CL+NLP Lunch] Nov. 15 @ noon: Diane Litman, "Automatically Predicting Peer-Review Helpfulness" Message-ID: Reminder: this afternoon! CL+NLP Lunch Tuesday, Nov. 15 @ noon in GHC 4405 Lunch will be provided. Diane Litman Professor, Computer Science Senior Scientist, Learning Research and Development Center University of Pittsburgh Automatically Predicting Peer-Review Helpfulness One path to improving the quality of student writing has involved the use of a peer review process, often supported by web-based technology. The long-term goal of our research is to use Natural Language Processing to address three core problems in peer-review of writing: reviews are often stated in ineffective ways, reviews and papers do not focus on important paper aspects, and authors do not have a process for organizing paper revisions. This talk will present our research on automatically predicting the helpfulness of peer reviews, one important task for improving the quality of feedback received by students, as well as for helping students write better reviews. We first examine whether standard product review analysis techniques also apply to our new context of peer reviews. We also investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that structural features, review unigrams and meta-data are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction. Finally, we investigate how different types of perceived helpfulness might influence the utility of features for automatic prediction. Our feature selection results show that certain low-level linguistic features are more useful for predicting student perceived helpfulness, while high-level cognitive constructs are more effective in modeling experts' perceived helpfulness. This work is done in collaboration with Wenting Xiong, Christian Schunn, and Kevin Ashley, University of Pittsburgh. Bio: Diane Litman is Professor of Computer Science, Senior Scientist with the Learning Research and Development Center, and faculty with the Graduate Program in Intelligent Systems, all at the University of Pittsburgh. She has been working in the field of artificial intelligence since she received her Ph.D. degree in Computer Science from the University of Rochester. Before joining Pitt, she was a member of the Artificial Intelligence Principles Research Department, AT&T Labs - Research (formerly Bell Laboratories). Dr. Litman's current research focuses on enhancing the effectiveness of educational technology through the use of spoken and natural language processing, affective computing, and machine learning and other statistical methods. Dr. Litman has been Chair of the North American Chapter of the Association for Computational Linguistics, has co-authored multiple papers winning best paper awards, and has been awarded Senior Member status by the Association for the Advancement of Artificial Intelligence. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20111115/9b5bb825/attachment.html