Connectionists: CFP: ClinSpEn subtask at Biomedical WMT Shared Task (WMT/EMNLP 2022): translation of clinical cases, entities, terminologies and ontologies
Martin Krallinger
krallinger.martin at
Mon Jul 18 05:25:23 EDT 2022
Call for Participation ClinSpEn @ Biomedical WMT Shared Task (WMT/EMNLP
Automatic Translation of Clinical cases, ontologies & medical entities:
Spanish - English
ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to
promote the development and evaluation of machine translation systems
adapted to the medical domain with three highly relevant sub-tracks:
clinical cases, medical controlled vocabularies/ontologies, and clinical
terms and entities extracted from medical content.
Key information:
ClinSpEn sub-track:
Biomedical WMT:
Main WMT:
EMNLP conference:
Sample/Training Data:
Clinical Cases:
Clinical Terms:
Ontology Concepts:
Machine translation applied to the clinical domain is a specially
challenging task due to the complexity of medical language and the heavy
use of health-related technical terms and medical expressions. Therefore
there is a large community of specialized medical translators, able to deal
with medical narratives, terminologies or the use of ambiguous
abbreviations and acronyms.
Taking into account the relevance, impact and diversity of health-related
content, as well as the rapidly growing number of publications, EHRs,
clinical trials, informed consent documents and medical terminologies
there is a pressing need to be able to generate more robust medical machine
translation resources together with independent quality evaluation
Recent advances in machine translation technologies together with the use
of other NLP components are showing promising results, thus domain
adaptation of MT approaches can have a significant impact in unlocking key
information from medical content.
The ClinSpEn sub-task of Biomedical WMT proposes three different highly
relevant sub-tracks, each associated with highly relevant medical machine
translation application scenarios::
ClinSpEn-CC (Clinical Cases) subtask: translation of clinical case
documents from English to Spanish, a type of document relevant both for
processing medical literature as well as clinical records.
ClinSpEn-CT (Clinical Terms): translation of clinical terms and entity
mentions from Spanish to English. The use terms were directly extracted
from medical literature and clinical records, with particular focus on
diseases, symptoms, findings, procedures and professions.
ClinSpEn-OC (Ontology Concepts): translation of clinical controlled
vocabularies and ontology concepts from English to Spanish. Ontologies and
structured vocabularies represent a key resource for semantic
interoperability, entity linking, biomedical knlwedgebases and precision
medicine, and thus there is a pressing need to generate multilingual
biomedical ontologies for a range of clinicla applications. .
A decently-sized sample set for each data type has been released.
Participants may use it to test their existing systems or try out new ones.
In addition to the manually translated test set by professional medical
translators, participants will also have access to a larger background
collection for each of the three substracks, which might serve as
additional resources and to promote scalability and robustness assessment
of machine translation technology.
Test and Background Set Release: July 21st, 2022
Participant Predictions Due: July 28th, 2022
Paper Submission Deadline: September 7th, 2022
Notification of Acceptance (peer-reviews): October 9th, 2022
Camera-ready Version Due: October 16th, 2022
WMT @ EMNLP: December 7th and 8th, 2022
[All deadlines are in AoE (Anywhere on Earth)]
For the time being, participants may register using the ClinSpEn
registration form at:
This form will be used to support teams during their participation and keep
them updated on the official WMT/EMNLP registration, as well as on all
related deadlines and important news.
Publications and WMT workshop
Teams participating in the ClinSpEn subtrack of Biomedical WMT will be
invited to contribute a systems description paper for the WMT 2022 Working
Notes proceedings. More information on the paper’s specifications,
formatting guidelines and review process at:
If you are interested in Machine Translation, the biomedical domain or
other language combinations, remember to check out the Biomedical WMT site
and the rest of this year’s sub-tracks and language pairs:
ClinSpEn Organizers
Salvador Lima-López (Barcelona Supercomputing Center, Spain)
Darryl Johan Estrada (Barcelona Supercomputing Center, Spain)
Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain)
Martin Krallinger (Barcelona Supercomputing Center, Spain)
Biomedical WMT Organizers
Rachel Bawden (University of Edinburgh, UK)
Giorgio Maria Di Nunzio (University of Padua, Italy)
Darryl Johan Estrada (Barcelona Supercomputing Center, Spain)
Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain)
Cristian Grozea (Fraunhofer Institute, Germany)
Antonio Jimeno Yepes (University of Melbourne, Australia)
Salvador Lima-López (Barcelona Supercomputing Center, Spain)
Martin Krallinger (Barcelona Supercomputing Center, Spain)
Aurélie Névéol (Université Paris Saclay, CNRS, LISN, France)
Mariana Neves (German Federal Institute for Risk Assessment, Germany)
Roland Roller (DFKI, Germany)
Amy Siu (Beuth University of Applied Sciences, Germany)
Philippe Thomas (DFKI, Germany)
Federica Vezzani (University of Padua, Italy)
Maika Vicente Navarro, Maika Spanish Translator, Melbourne, Australia
Dina Wiemann (Novartis, Switzerland)
Lana Yeganova (NCBI/NLM/NIH, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
More information about the Connectionists
mailing list