Connectionists: Final DrugProt CFP: Drug-Interactions & Large Scale Text Mining sub-tracks (BioCreative VII)

Tue Jul 27 05:24:16 EDT 2021

*DrugProt Shared Task (BioCreative VII track 1- 2021) *

*Text mining drug-protein/gene interactions (DrugProt) & Large-scale Text
Mining track*

https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-1/

BioCreative efforts have provided highly relevant resources for advancing
biomedical text mining research, including datasets and system
evaluations (e.g.
BioBERT benchmark datasets, ChemProt corpus, CHEMDNER corpus, etc.).

We are organizing the DrugProt track focusing on the (1) automatic
extraction of relations between drugs/chemicals and genes/proteins of
interest for drug discovery and biomedicine and (2) large-scale text mining.

It is getting increasingly challenging to efficiently exploit drug-related
information described in the growing amount of scientific literature. There
are a range of different types of drug-gene/protein interactions, and their
systematic extraction and characterization is essential to analyze, predict
and explore key biomedical properties underlying high impact biomedical
applications.

We foresee that the DrugProt track will promote the development NLP
techniques to extract critical health information, generating results
useful for:

-        Drug discovery, drug repurposing & drug design

-        Drug-induced adverse reactions, off target interactions

-        Molecular medicine, systems biology and bioinformatics

-        Biomedical knowledge graph mining

Therefore the DrugProt organizers have released a large training corpus of
manually annotated entity mentions for drugs/chemicals as well as
genes/proteins together with their interactions (13 different types of
interactions).

DrugProt teams participating will be provided with the following corpus:

- PubMed abstracts (3500 training, 750 development, 750 test)

- Manually annotated Gold Standard chemical compound mentions (> 65000)

- Manually annotated Gold Standard gene/protein mentions (> 60000)

- Manually annotated Gold Standard drug/chemical-protein/gene interactions
(> 24000)

In addition to the main DrugProt task we also have included an additional
sub-track on *Large Scale Biomedical Text mining*, asking teams to
automatically detect interactions from a collection of over *2,3 million*
records with almost *54 million* entity annotations.

*Key information:*

*DrugProt web:*
https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-1/

*DrugProt Corpus: *https://zenodo.org/record/5119892#.YP_ObO3tZpk
<https://zenodo.org/record/4955411#.YNnWl27tZGb>

*Large-Scale Corpus: *https://zenodo.org/record/5119879#.YP_Ovu3tZpl

*Registration:*
https://docs.google.com/forms/d/e/1FAIpQLScdMnKFMncL8qDkcRx6aV6lYRm8PbufPs1rIAODwxCcPoLkcg/viewform

Evaluation will be done using micro-averaged f-measure by comparing the
automatically extracted relations against previously manually labelled Gold
Standard relations.

*Important dates*

✓ Test set release- July 19th 2021
✓ Large scale Text Mining sub-track set release- July 19th 2021
DrugProt Test set prediction submission due: September 15th 2021
Large scale Text Mining subtrack submission due: September 20th 2021
Short technical systems description paper due: October 1st 2021
Revised paper submission due: October 17th 2021
BioCreative VII Workshop (virtual):  November 8th-10th, 2021

*BioCreative VII workshop proceedings and Journal Special Issue*

Participating teams will be invited to contribute to the: Proceedings of
the Seventh BioCreative Challenge Evaluation Workshop. Proceedings papers
are free of charge.

A selected number of top performing teams will also be invited to
contribute with a longer system description paper to a special issue on
BioCreative VII to be published in the journal Database.

*Task organizers: *

Martin Krallinger, Barcelona Supercomputing Center, Spain

Antonio Miranda, Barcelona Supercomputing Center, Spain

Farrokh Mehryary, University of Turku, Finland

Jouni Luoma, University of Turku, Finland

Sampo Pyysalo, University of Turku, Finland

Alfonso Valencia, Barcelona Supercomputing Center, Spain

*References*

[1] Krallinger, Martin, et al. "Overview of the protein-protein interaction
annotation extraction task of BioCreative II." Genome biology 9.2 (2008):
1-19.
[2] Krallinger, Martin, et al. "CHEMDNER: The drugs and chemical names
extraction challenge." Journal of cheminformatics 7.1 (2015): 1-11.
[3] Krallinger, Martin, et al. "Overview of the BioCreative VI
chemical-protein interaction Track." Proceedings of the sixth BioCreative
challenge evaluation workshop. Vol. 1. 2017.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20210727/5560b5cf/attachment.html>