Connectionists: internship Naver Labs Europe: speech+text

Matthias Gallé mgalle at gmail.com
Wed Dec 1 03:02:20 EST 2021


Dear all,

This is an internship opportunity for a talented student interested in
improving machine translation with speech data.

application at:

https://europe.naverlabs.com/job/using-monolingual-speech-data-to-improve-multilingual-translation-models-internship/


**Using monolingual speech data to improve multilingual translation models**

A large part of today's 7000+ languages do not have a writing system, and
many more only have a very small amount of available textual data. As an
example, while wikipedia exists in 264 languages, only 100 of those have
more than 5000 pages.
In this internship we plan to investigate how to leverage monolingual
speech data to improve multilingual text translation systems. For that, we
will base ourselves on existing work in speech-to-text translation: models
that start from large pre-trained models (e.g., [1, 2]) as well as our
previous experience in end-to-end speech translation models [3] and the use
of monolingual data to improve translation systems for unseen languages [4].
At NLE we proud ourselves of very tight collaborations with our interns,
consisting of very regular meetings and joint brainstorming and
development. Interns are integrated into existing teams and participate
actively in the scientific activities of the centre.

*REQUIRED SKILLS*
- enrolled in a PhD or research master programme, in the topic of NLP,
speech processing or applied machine learning
- experience in at least one of machine translation, ASR or multi-task
learning
- good knowledge in tensorflow or (preferably) pytorch
- track record of published papers in top-tier conferences is a plus

*REFERENCES*
[1] Tang, Yun, et al. "Improving speech translation by understanding and
learning from the auxiliary text translation task." arXiv preprint
arXiv:2107.05782 (2021).
[2] Li, Xian, et al. "Multilingual speech translation with efficient
finetuning of pretrained models." arXiv preprint arXiv:2010.12829 (2020).
[3] Bérard, Alexandre, et al. "Listen and translate: A proof of concept for
end-to-end speech-to-text translation." arXiv preprint arXiv:1612.01744
(2016).
[4] Üstün, Ahmet, et al. "Multilingual unsupervised neural machine
translation with denoising adapters." arXiv preprint arXiv:2110.10472
(2021).
https://europe.naverlabs.com/job/using-monolingual-speech-data-to-improve-multilingual-translation-models-internship/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20211201/fc6b6f69/attachment.html>


More information about the Connectionists mailing list