Connectionists: Important UPDATES/EXTENSION: ClinSpEn medical machine translation sub-track (Biomedical WMT Task, EMNLP 2022)

Salvador Lima salvador.limalopez at gmail.com
Thu Aug 4 06:19:28 EDT 2022


Important UPDATES/EXTENSION: ClinSpEn sub-track (Biomedical WMT Task, EMNLP
2022)

Machine Translation of Clinical cases, ontologies & medical entities:
Spanish - English

https://temu.bsc.es/clinspen/

Evaluation period extension, test and background data available on Zenodo
and CodaLab submission available.

The ClinSpEn track of the Biomedical WMT 2022 shared task tries to address
a pressing need and emerging research topic related to the development and
exploitation of multilingual clinical NLP and text mining applications.

Recent advances in neural machine translation approaches (MT) adapted to
specific domains and text genres have resulted in promising results that
facilitate processing of healthcare and clinical data beyond language
silos.

The ClinSpEn sub-track tries to promote the use of advanced machine
translation technologies applied to three high impact healthcare
application scenarios:

(1) automatic translation of clinical case documents of importance to
examine how MT could be further applied to cope with clinical records

(2) automatic translation of clinical terms and entity mentions extracted
directly from medical records and literature to improve multilingual
semantic annotation technologies

(3) automatic translation of ontologies and controlled vocabulary concepts
of uttermost importance for multilingual data and concept normalization



These three scenarios will be addressed by three specific benchmark data
collections used for evaluation purposes by the ClinSpEn biomedical WMT
track:

ClinSpEn-CC (Clinical Cases): EN>ES translation of clinical case documents.

ClinSpEn-CT (Clinical Terms): ES>EN translation of clinical terms and
entity mentions extracted from records and literature.

ClinSpEn-OC (Ontology Concepts): EN>ES translation of highly used open
clinical controlled vocabularies and ontology concepts.



Important links:

   -

   ClinSpEn web: https://temu.bsc.es/clinspen/
   -

   Biomedical WMT web:
   https://statmt.org/wmt22/biomedical-translation-task.html
   -

   WMT2022: https://statmt.org/wmt22/
   -

   EMNLP conference: https://2022.emnlp.org/
   -

   Data (NEW!):


   -

   Clinical Cases: https://doi.org/10.5281/zenodo.6497350
   -

   Clinical Terms: https://doi.org/10.5281/zenodo.6497372
   -

   Ontology Concepts: https://doi.org/10.5281/zenodo.6497388


   -

   CodaLab: https://codalab.lisn.upsaclay.fr/competitions/6696
   -

   Team Registration (mandatory): https://temu.bsc.es/clinspen/registration/

For the ClinSpEn track Gold Standard manual translations generated by
professional medical translators have been generated to evaluate
participating teams. The primary evaluation metric to be used for this
track will be SacreBLEU.

Participants will also have access to a larger background collection to
promote scalability and robustness assessment of machine translation
technology.


Updated schedule:

   -

   Participant Predictions Due: August 30th, 2022  (UPDATED EXTENSION!)
   -

   Paper Submission: September 7th, 2022
   -

   Acceptance notification: October 9th, 2022
   -

   Camera-ready version: October 16th, 2022
   -

   WMT workshop at EMNLP: December 7th and 8th, 2022





Publications and workshop


Participating teams will be invited to contribute a systems description
paper for the WMT 2022 Working Notes proceedings. This workshop will be
part of the prestigious EMNLP 2022 conference. More information on the
paper’s specifications, formatting guidelines and review process at:
https://statmt.org/wmt22/index.html.



ClinSpEn Track Organizers

   -

   Salvador Lima-López (BSC)
   -

   Darryl Johan Estrada (BSC)
   -

   Eulàlia Farré-Maduell (BSC)
   -

   Martin Krallinger (BSC)


Biomedical WMT Organizers

   -

   Rachel Bawden (University of Edinburgh, UK)
   -

   Giorgio Maria Di Nunzio (University of Padua, Italy)
   -

   Darryl Johan Estrada (Barcelona Supercomputing Center, Spain)
   -

   Eulàlia Farré-Maduell (Barcelona Supercomputing Center, Spain)
   -

   Cristian Grozea (Fraunhofer Institute, Germany)
   -

   Antonio Jimeno Yepes (University of Melbourne, Australia)
   -

   Salvador Lima-López (Barcelona Supercomputing Center, Spain)
   -

   Martin Krallinger (Barcelona Supercomputing Center, Spain)
   -

   Aurélie Névéol (Université Paris Saclay, CNRS, LISN, France)
   -

   Mariana Neves (German Federal Institute for Risk Assessment, Germany)
   -

   Roland Roller (DFKI, Germany)
   -

   Amy Siu (Beuth University of Applied Sciences, Germany)
   -

   Philippe Thomas (DFKI, Germany)
   -

   Federica Vezzani (University of Padua, Italy)
   -

   Maika Vicente Navarro, Maika Spanish Translator, Melbourne, Australia
   -

   Dina Wiemann (Novartis, Switzerland)
   -

   Lana Yeganova (NCBI/NLM/NIH, USA)



-- 
Salvador Lima Lopez
RESEARCH ENGINEER
Life Sciences - Text Mining, BSC-CNS
Barcelona, Spain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20220804/50681fb2/attachment.html>


More information about the Connectionists mailing list