Connectionists: Call for Shared Task Participation - SemEval 2017 Task 1: Semantic Textual Similarity (STS)
Daniel Cer
cer at google.com
Tue Oct 11 13:49:21 EDT 2016
Call for Shared Task Participation
SemEval 2017 Task 1
Semantic Textual Similarity (STS)
Semantic Textual Similarity (STS) measures the degree of equivalence in the
underlying semantics of paired snippets of text. While making such an
assessment is trivial for humans, constructing algorithms and computational
models that mimic human level performance represents a difficult and deep
natural language understanding problem.
STS evaluations have seen significant progress in methods targeted at a
specific language such as English or Spanish. For the 2017 shared task, the
emphasis is on building multilingual textual similarity models that are
capable of assessing both same language and cross-lingual sentence pairs.
The primary evaluation for the shared task assesses methods over a
combination of same language pairs in Arabic, English and Spanish as well
as cross-lingual Arabic-English and Spanish-English pairs.
To encourage the development of methods that can be readily applied or
adapted to new languages, we also provide an optional evaluation track with
a surprise language that will only be announced at the beginning of the
evaluation period. This optional track provides an opportunity to explore
STS models capable of zero-shot learning via mechanisms such as
multilingual embeddings.
In addition to the multilingual primary evaluation and the surprise
language track, a number of language and language pair specific tracks are
also provided. We hope that these tracks will provide participants with
particular linguistic expertise a chance to excel as well as provide an
opportunity to compare performance differences between multilingual and
language specific methods.
Task Definition
===============
Given two sentences, participants are asked to produce a continuous valued
similarity score on a scale from 0 to 5, with 0 indicating that the
semantics of the sentences are completely independent and 5 signifying
semantic equivalence. Performance is assessed by computing the Pearson
correlation between machine assigned semantic similarity scores and human
judgments.
Following the emphasis on building multilingual and cross-lingual models,
the 2017 shared task is organized into the following seven multilingual and
cross-lingual tracks:
Track 0 - Primary: Combined evaluation of all announced
monolingual
and cross-lingual language pairings explored by
the 2017 task: ar-ar, ar-en, en-en, es-en, and
es-es. The primary track will not include the
surprise language evaluation data.
Track 1 - Arabic-Arabic: Evaluation only on ar-ar pairs.
Track 2 - Arabic-English: Evaluation only on ar-en pairs.
Track 3 - Spanish-Spanish: Evaluation only on es-es pairs
Track 4 - Spanish-English: Evaluation only on es-en pairs.
Track 5 - English-English: Evaluation only on en-en pairs.
Track 6 - Surprise language track (announced during the evaluation period)
For all language pairings, participants will be provided with two sentence
length snippets of text, s1 and s2. The two snippets will then be used to
compute and return a continuous valued semantic similarity score.
The cross-lingual language pairings (ar-en, es-en) only differ from the
monolingual language pairings (ar-ar, en-en, es-es) in that the two text
snippets in each pair are written in different languages. The inclusion of
cross-lingual STS pairs follows a successful pilot in 2016 that paired
English and Spanish sentences. Depending on the approach being used to
compute the similarity scores this may present different degrees of
difficulty in adapting the underlying model to handle the cross-lingual
pairs.
Participants are encouraged to review the successful approaches to
monolingual and cross-lingual STS from prior years of the STS shared task
(Agirre et al. 2016; Agirre et al. 2015; Agirre et al. 2014; Agirre et al.
2013; Agirre et al. 2012)
2017 Data
=========
This year's shared task includes one evaluation set for each of the seven
tracks described above. Each evaluation set consists of between 200 to 250
sentence pairs. Within each evaluation set, we will attempt to
approximately balance the distribution of STS scores.
For training data, participants are encouraged to make use of all existing
English, Spanish and cross-lingual English-Spanish data sets from prior STS
evaluations. This includes all previously released trial, training and
evaluation data.
Since this is the first year that we will include Arabic as part of an STS
evaluation, we will release training data for both monolingual Arabic and
cross-lingual Arabic-English. Each training set will consist of
approximately 14,000 pairs sourced from prior English STS evaluations.
As with the 2016 evaluation, participants are allowed and very much
encouraged to train purely unsupervised models and model components on
arbitrary data (e.g., unsupervised word embeddings).
Participation
=============
[Register]
To register, please complete the following form:
https://docs.google.com/forms/d/e/1FAIpQLScXnt7qeioCPyxu6dv9wrSDYaF04bRgVBFCUbahxsAG6F43Sg/viewform
<https://docs.google.com/forms/d/1HTRtP7B94gqdW5YuRfRh5pEBhukuRIh5hXR1nOEib90/viewform?usp=send_form>
[Website and trial data]
For more details, including trial data, see the STS SemEval 2017 Task 1
webpage at: http://alt.qcri.org/semeval2017/task1/
<http://alt.qcri.org/semeval2016/task1/>
[Mailing List]
Join the mailing list for task updates and discussion at:
http://groups.google.com/group/STS-semeval.
Important dates
===============
Trail data ready: Wed 21 Sep 2016
Training data ready: Mon 24 Oct 2016
Evaluation start: Mon 09 Jan 2017
Evaluation end: Mon 30 Jan 2017
Results posted: Mon 06 Feb 2017
Paper submissions due: Mon 27 Feb 2017
Author notifications: Mon 03 Apr 2017
Camera ready submissions due: Mon 17 Apr 2017
SemEval workshop: Summer 2017
Organizers (alpha. order)
==========
Eneko Agirre, Daniel Cer, Mona Diab, Lucia Specia
References
==========
Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre,
Rada Mihalcea, German Rigau, Janyce Wiebe. SemEval-2016 Task 1: Semantic
Textual Similarity, Monolingual and Cross-Lingual Evaluation. Proceedings
of SemEval 2016.
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor
Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada
Mihalcea, German Rigau, Larraitz Uria and Janyce Wiebe. SemEval-2015 Task
2: Semantic Textual Similarity, English, Spanish and Pilot on
Interpretability. Proceedings of SemEval 2015.
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor
Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau and Janyce Wiebe.
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity. Proceedings
of SemEval 2014.
Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre and WeiWei Guo.
*SEM 2013 shared task: Semantic Textual Similarity. Proceedings of *SEM
2013.
Eneko Agirre, Daniel Cer, Mona Diab and Aitor Gonzalez-Agirre. SemEval-2012
Task 6: A Pilot on Semantic Textual Similarity. Proceedings of SemEval 2012.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20161011/da701a7b/attachment.html>
More information about the Connectionists
mailing list