Connectionists: Master2 (research) position at Multispeech Team, LORIA (Nancy, France)

Wed Dec 2 11:28:56 EST 2015

Master2 position at Multispeech Team, LORIA (Nancy, France) 

Automatic speech recognition: contextualisation of the language model based on neural networks by dynamic adjustment 

Framework of ANR project ContNomina 

The technologies involved in information retrieval in large audio/video databases are often based on the analysis of large, but closed, corpora, and on machine learning techniques and statistical modeling of the written and spoken language. The effectiveness of these approaches is now widely acknowledged, but they nevertheless have major flaws, particularly for what concern proper names, that are crucial for the interpretation of the content. 

In the context of diachronic data (data which change over time) new proper names appear constantly requiring dynamic updates of the lexicons and language models used by the speech recognition system. 

As a result, the ANR project ContNomina (2013-2017) focuses on the problem of proper names in automatic audio processing systems by exploiting in the most efficient way the context of the processed documents. To do this, the student will address the contextualization of the recognition module through the dynamic adjustment of the language model in order to make it more accurate. 

Subject 

Current systems for automatic speech recognition are based on statistical approaches. They require three components: an acoustic model, a lexicon and a language model. This stage will focus on the language model. The language model of our recognition system is based on a neural network learned from a large corpus of text. The problem is to re-estimate the language model parameters for a new proper name depending on its context and a small amount of adaptation data. Several tracks can be explored: adapting the language model, using a class model or studying the notion of analogy. 

Our team has developed a fully automatic system for speech recognition to transcribe a radio broadcast from the corresponding audio file. The student will develop a new module whose function is to integrate new proper names in the language model. 

Required skills 

Background in statistics and object-oriented programming. 

Localization and contacts 

Loria laboratory, Multi speech team , Nancy, France 

Irina.illina at loria.fr dominique.fohr at loria.fr 

Candidates should email a detailed CV and diploma 

References 

[1] J. Gao, X. He, L. Deng Deep Learning for Web Search and Natural Language Processing , Microsoft slides, 2015 

[2] X. Liu, Y. Wang, X. Chen, M. J. F. Gales, and P. C. Woodland. Efficient lattice rescoring using recurrent neural network langage models , in Proc. ICASSP, 2014, pp. 4941–4945. 

[3] M. Sundermeyer, H. Ney, and R. Schlüter. From Feedforward to Recurrent LSTM Neural Networks for Language Modeling . IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 23, number 3, pages 517-529, March 2015. 

-- 
Associate Professor 
Lorraine University 
LORIA-INRIA 
office C147 
Building C 
615 rue du Jardin Botanique 
54600 Villers-les-Nancy Cedex 
Tel:+ 33 3 54 95 84 90 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20151202/1d5b53f4/attachment.html>