<html><body><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><div><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span style="color: #000000;" data-mce-style="color: #000000;"><span style="font-family: Liberation\ Serif, serif;" data-mce-style="font-family: Liberation\ Serif, serif;"><span style="font-size: medium;" data-mce-style="font-size: medium;"><span lang="en-US"><b>Pos Doctoral Position (12 months)</b></span></span></span></span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span style="color: #000000;" data-mce-style="color: #000000;"><span style="font-family: Liberation\ Serif, Times\ New\ Roman, serif;" data-mce-style="font-family: Liberation\ Serif, Times\ New\ Roman, serif;"><span style="font-size: medium;" data-mce-style="font-size: medium;"><span lang="en-US"><b>Natural language processing: </b></span></span></span></span><span style="font-family: Liberation\ Serif, Times\ New\ Roman, serif;" data-mce-style="font-family: Liberation\ Serif, Times\ New\ Roman, serif;"><span style="font-size: medium;" data-mce-style="font-size: medium;"><span lang="en-US"><b> automatic speech recognition system using deep neural networks without out-of-vocabulary words </b></span></span></span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left">_______________________________________</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span lang="en-US">- </span><span lang="en-US"><b>Location:</b></span><span lang="en-US"> </span><span lang="en-AU">INRIA Nancy Grand Est research center, France</span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span lang="en-US">- </span><span lang="en-US"><b>Research theme:</b></span><span lang="en-US"> PERCEPTION, COGNITION, INTERACTION</span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span lang="en-US">- </span><span lang="en-US"><b>Project-team:</b></span><span lang="en-US"> Multispeech</span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span lang="en-US">- </span><span lang="en-US"><b>Scientific Context: </b></span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify">More and more audio/video appear on Internet each day. About 300 hours of multimedia are uploaded per minute. In these multimedia sources, audio data represents a very important part. If these documents are not transcribed, automatic content retrieval is difficult or impossible. The classical approach for spoken content retrieval from audio documents is an automatic speech recognition followed by text retrieval.</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="justify"><span lang="en-US">An automatic speech recognition system (ASR) uses a </span><span lang="en-US"><i><b>lexicon</b></i></span><span lang="en-US"> containing the most frequent words of the language and only the words of the lexicon can be recognized by the system. </span><span lang="en-US"><i><b>New Proper Names</b></i></span><span lang="en-US"> (PNs) appear constantly, requiring dynamic updates of the lexicons used by the ASR. These PNs evolve over time and no vocabulary will ever contains all existing PNs. When a person searches for a document, proper names are used in the query. If these PNs have not been recognized, the document cannot be found. These missing PNs can be very important for the understanding of the document.</span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="justify"><span lang="en-US">In this study, we will focus on </span><span lang="en-US"><i><b>the problem of proper names in automatic recognition systems</b></i></span><span lang="en-US">. The problem is how to model relevant proper names for the audio document we want to transcribe. </span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><b>- Missions:</b></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="justify"><span lang="en-US">We assume that in an audio document to transcribe we have missing proper names, i.e. proper names that are pronounced in the audio document but that are not in the lexicon of the automatic speech recognition system; these proper names cannot be recognized (out-of-vocabulary proper names, OOV PNs). </span><span lang="en-US"><i><b>The purpose of this work is to design a methodology how to find and model a list of relevant OOV PNs that correspond to an audio document. </b></i></span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify">Assuming that we have an approximate transcription of the audio document and huge text corpus extracted from internet, several methodologies could be studied:</p><ul><li><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify">From the approximate OOV pronunciation in the transcription, generate the possible writings of the word (phoneme to character conversion) and search this word in the text corpus.</p></li><li><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify">A deep neural network can be designed to predict OOV proper names and their pronunciations with the training objective to maximize the retrieval of relevant OOV proper names.</p></li></ul><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="justify"><span lang="en-US">The proposed approaches will be validated using the ASR developed in our team.</span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify"><b>Keywords</b>: deep neural networks, automatic speech recognition, lexicon, out-of-vocabulary words.</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span lang="en-US">- </span><span lang="en-US"><b>Bibliography </b></span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify">[Mikolov2013] Mikolov, T., Chen, K., Corrado, G. and Dean, J. “Efficient estimation of word representations in vector space”, Workshop at ICLR, 2013.</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify">[Deng2013] Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y. and Acero A. “Recent advances in deep learning for speech research at Microsoft”, Proceedings of ICASSP, 2013.</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify">[Sheikh2016] Sheihk, I., Illina, I., Fohr, D., Linarès, G. “Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition”. Interspeech, 2016.</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify">[Li2017] J. Li, G. Ye, R. Zhao, J. Droppo, Y. Gong , “Acoustic-to-Word Model without OOV”, ASRU, 2017.</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span lang="en-US">- </span><span lang="en-US"><b>Skills and profile</b></span><span lang="en-US">: PhD in computer science, background in statistics, natural language processing, experience with deep learning tools (</span><span lang="en-US"><i>keras, kaldi, etc.)</i></span><span lang="en-US"> and computer program skills (</span><span lang="en-US"><i>Perl, Python</i></span><span lang="en-US">).<br> </span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span lang="en-US">- </span><span lang="en-US"><b>Additional information:</b></span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="text-indent: 0.35cm; margin: 0px;" data-mce-style="text-indent: 0.35cm; margin: 0px;" align="left"><span lang="en-US">Supervision and contact: Irina Illina, LORIA/INRIA (</span><span style="color: #0000ff;" data-mce-style="color: #0000ff;"><span style="text-decoration: underline;" data-mce-style="text-decoration: underline;"><a class="western" href="mailto:illina@loria.fr" data-mce-href="mailto:illina@loria.fr"><span lang="en-US">illina@loria.fr</span></a></span></span><span lang="en-US">), Dominique Fohr INRIA/LORIA (</span><span style="color: #0000ff;" data-mce-style="color: #0000ff;"><span style="text-decoration: underline;" data-mce-style="text-decoration: underline;"><a class="western" href="mailto:dominique.fohr@loria.fr" data-mce-href="mailto:dominique.fohr@loria.fr"><span lang="en-US">dominique.fohr@loria.fr</span></a></span></span><span lang="en-US">) </span><span style="color: #0000ff;" data-mce-style="color: #0000ff;"><span style="text-decoration: underline;" data-mce-style="text-decoration: underline;"><a class="western" href="https://members.loria.fr/IIllina/" data-mce-href="https://members.loria.fr/IIllina/"><span lang="en-US">https://members.loria.fr/IIllina/</span></a></span></span><span lang="en-US">, </span><span style="color: #0000ff;" data-mce-style="color: #0000ff;"><span style="text-decoration: underline;" data-mce-style="text-decoration: underline;"><a class="western" href="https://members.loria.fr/DFohr/" data-mce-href="https://members.loria.fr/DFohr/"><span lang="en-US">https://members.loria.fr/DFohr/</span></a></span></span><br data-mce-bogus="1"></p><p class="western" style="text-indent: 0.35cm; margin: 0px;" data-mce-style="text-indent: 0.35cm; margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="text-indent: 0.35cm; margin: 0px;" data-mce-style="text-indent: 0.35cm; margin: 0px;" lang="en-US" align="left">Additional links : Ecole Doctorale IAEM Lorraine</p><p class="western" style="text-indent: 0.35cm; margin: 0px;" data-mce-style="text-indent: 0.35cm; margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="text-indent: 0.35cm; margin: 0px;" data-mce-style="text-indent: 0.35cm; margin: 0px;" lang="en-US" align="left"><b>Deadline to apply</b>: June 6<sup>th</sup></p><p class="western" style="text-indent: 0.35cm; margin: 0px;" data-mce-style="text-indent: 0.35cm; margin: 0px;" lang="en-US" align="left"><b>Selection results</b>: end of June</p><p class="western" style="text-indent: 0.35cm; margin: 0px;" data-mce-style="text-indent: 0.35cm; margin: 0px;" align="left"><br></p><p class="western" style="text-indent: 0.35cm; margin: 0px;" data-mce-style="text-indent: 0.35cm; margin: 0px;" align="left"><span lang="en-US">Duration :1</span><span lang="en-US">2</span><span lang="en-US"> of months.</span><span lang="en-US"> </span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" align="left"><span lang="en-US">Starting date: between Nov. 1</span><sup><span lang="en-US">st</span></sup><span lang="en-US"> 2018 and Jan. 1</span><sup><span lang="en-US">st</span></sup><span lang="en-US"> 2019<br> Salary: about 2.115 euros net, medical insurance included</span></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US">The candidates must have defended their PhD later than Sept. 1st 2016 and before the end of 2018. </p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-GB">The candidates are required to provide the following documents in a single pdf or ZIP file: </p><ul><li><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-GB">CV including a description of your research activities (2 pages max) and a short description of what you consider to be your best contributions and why (1 page max and 3 contributions max); the contributions could be theoretical or  practical. Web links to the contributions should be provided. Include also a brief description of your scientific and career projects, and your scientific positioning regarding the proposed subject.</p></li><li><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-GB">The report(s) from your PhD external reviewer(s), if applicable.</p></li><li><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-GB">If you haven't defended yet, the list of expected members of your PhD committee (if known) and the expected date of defence.</p></li></ul><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-GB">In addition, at least one recommendation letter from the PhD advisor should be sent directly by their author(s) to the prospective postdoc advisor.</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="justify"><br></p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left">Help and benefits:</p><p class="western" style="margin: 0px;" data-mce-style="margin: 0px;" lang="en-US" align="left"><br></p><ul><li><p style="margin: 0px; line-height: 115%;" data-mce-style="margin: 0px; line-height: 115%;" lang="en-US" align="left"><span style="font-family: Times\ New\ Roman, serif;" data-mce-style="font-family: Times\ New\ Roman, serif;"><span style="font-size: small;" data-mce-style="font-size: small;">Possibility of free French courses</span></span></p></li><li><p style="margin: 0px; line-height: 115%;" data-mce-style="margin: 0px; line-height: 115%;" lang="en-US" align="left"><span style="font-family: Times\ New\ Roman, serif;" data-mce-style="font-family: Times\ New\ Roman, serif;"><span style="font-size: small;" data-mce-style="font-size: small;">Help for finding housing</span></span></p></li><li><p style="margin: 0px; line-height: 115%;" data-mce-style="margin: 0px; line-height: 115%;" lang="en-US" align="left"><span style="font-family: Times\ New\ Roman, serif;" data-mce-style="font-family: Times\ New\ Roman, serif;"><span style="font-size: small;" data-mce-style="font-size: small;">Help for the resident card procedure and for husband/wife visa</span></span></p></li></ul><p style="margin: 0px; line-height: 115%;" data-mce-style="margin: 0px; line-height: 115%;" lang="en-US" align="left"><br></p></div><div><br></div><div>-- <br></div><div><span name="x"></span>Associate Professor <br>Lorraine University<br>LORIA-INRIA<br>office C147 <br>Building C <br>615 rue du Jardin Botanique<br>54600 Villers-les-Nancy Cedex<br>Tel:+ 33 3 54 95 84 90<span name="x"></span><br></div></div></body></html>