Connectionists: Call for Participation: SIGTYP 2020 Shared Task on the prediction of typological features

Fri Apr 3 11:51:00 EDT 2020

https://sigtyp.github.io/st2020.html

The SIGTYP workshop, co-located with the EMNLP 2020 conference in Punta
Cana (Dominican Republic), is offering a shared task on the prediction of
typological features. The shared task encompasses nearly 2,000 languages,
with typological features taken from the World Atlas of Language Structures
(WALS; Dryer and Haspelmath 2013).

To participate in the shared task, you will build a system that can predict
typological properties of languages, given a handful of observed features.
Training examples and development examples have already been provided (see
link below). All submitted systems will be compared on a held-out test set.

Moreover, you will be invited to describe your system in a system paper for
the SIGTYP workshop proceedings. The task organisers will write an overview
paper that describes the task and summarises the different approaches
taken, and their results.

*Important Links*

- Download Train and Dev data:
https://github.com/sigtyp/ST2020/tree/master/data
- Register for the Task! https://sigtyp.github.io/st2020-reg.html

*Important Dates*

- Training data Release: 26 March 2020
- Test data Release: 20 June 2020
- Submissions Due: 1 July 2020
- Writeup Due: 1 August 2020

*Description*

The typological features in WALS represent one approach to the
categorization of the languages of the world according to their linguistic
properties, e.g. in terms of their syntax, morphology, phonology inter
alia. One example of such a typological feature is the basic word order
feature. For instance, English is best described as a subject-verb-object
(SVO) language whereas Japanese is best described as a subject-object-verb
(SOV) language.

One major issue with WALS, however, is that it is both sparse and skewed in
terms of language-feature annotations. It is sparse in the sense that most
languages only have annotations for a handful of features, and skewed in
the sense that a few features have much wider coverage than others.
Luckily, such features often correlate with one another, which allows for
prediction of those features from others. For instance, languages where the
verb precedes the object tend to have prepositions, e.g. Norwegian, whereas
languages where the object precedes the verb word tend to have
postpositions, e.g. Japanese.

Although there is a significant amount of previous work dealing with
versions of this task (*Daumé III and Campbell 2017; Bjerva et al. 2019;
Ponti et al. 2019*), important design choices have been frequently ignored.
Some papers controlled for genetic relationships between training and
evaluation languages, but little-to-no work has considered controlling for
geographical proximity.

The shared task will consist of two settings (subtasks):

   1. *Constrained*: only provided training data can be employed.
   2. *Unconstrained*: training data can be extended with any external
   source of information (e.g. pre-trained embeddings, raw texts, etc.)

*Organizers*

Johannes Bjerva
Isabelle Augenstein
Aditi Chaudhary
Ekaterina Vylomova
Edoardo M. Ponti
Giuseppe Celano
Liz Salesky
Ryan Cotterell
Michael Regan
Sabrina J. Mielke

*Contact*

- email: sigtyp AT gmail DOT com
- website: https://sigtyp.github.io/st2020.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20200403/7d8f9d97/attachment.html>