[CL+NLP Lunch] Jan Botha, CL+NLP Lunch Oct 24 @ noon

Dani Yogatama dyogatama at cs.cmu.edu
Wed Oct 16 14:26:23 EDT 2013


*CL+NLP Lunch *(*http://www.cs.cmu.edu/~nlp-lunch/*)
*Speaker*: Jan Botha, Oxford University
*Date*: Thursday, October 24, 2013
*Time*: 12:00 noon
*Venue*: GHC 6115

*Title*: Unsupervised learning of non+concatenative morphology

*Abstract*:
The popular view of words as sequences of morphemes may work
for unsupervised morphological analysis of various languages, but it
is overly simplistic in the face of non-concatenative phenomena such
as root-templatic stem derivation in Semitic languages. I'll present a
nonparametric Bayesian approach that addresses concatenative and
non-concatenative morphology simultaneously. Experiments on Arabic and
Hebrew show that the richer account of stem morphology improves
morphological segmentation. Identification of discontiguous root
morphemes is fairly accurate and could be a source of features for
downstream language processing tasks. To illustrate the flexibility of
the approach, I'll also sketch some untested instantiations targeting
other non-concatenative processes such as circumfixing and infixing.

*Biography*:
Jan Botha is a fourth-year PhD student at Oxford University. As a
member of the Computational Linguistics Group, his research focuses on
statistical modelling of morphologically rich languages. This interest
has led him on excursions into Bayesian nonparametrics and, more
recently, distributed representation learning. Before moving to Oxford
to take up his Rhodes scholarship, he completed an interdisciplinary
Honours Bachelors degree in Physics, Maths and Computer Science at
Stellenbosch University in South Africa.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/nlp-lunch/attachments/20131016/f5b010c1/attachment.html>


More information about the nlp-lunch mailing list