<div dir="ltr">Open PhD Position in Deep Probabilistic Reinforcement Learning for Audio-Visual Human-Robot Interaction at INRIA Grenoble<br><br>More information and application procedure:<br><a href="https://jobs.inria.fr/public/classic/en/offres/2020-02718">https://jobs.inria.fr/public/classic/en/offres/2020-02718</a><br><br>Starting date: 2020-10-01<br>Duration of contract: 3 years<br>Deadline to apply: 2020-07-08<br><br><br>Description:<br><br>Reinforcement learning, and in particular deep reinforcement learning (DRL), became very popular in the recent, successfully addressing a wide variety of tasks such as board game playing. It has also been popular in addressing computer vision or pattern recognition tasks for which a differentiable loss function is difficult to find or does not exist [1]. The most popular methodology in DRL is to approximate the so-called action-value function, leading to deep Q networks (DQN) [2] and its derivatives. In these methods, both the action policy and the system’s transition function are implicitly learned within the neural network that approximates the action-value. This is limiting since it is unclear how to incorporate prior knowledge of the policy or of the system. Alternatives to the mainstream methodology exists and they can be based, for instance, in parametrising a deterministic policy, and optimizing the parametrisation by gradient descent [3]. Probabilistic alternatives based on the EM algorithm were first proposed in the late 90’s [4], and revisited in the 2010’s, see for instance [5].<br><br>In parallel, progress on deep learning lead to the conception of variational auto-encoders [6], which are non-linear probabilistic generative models. The use of VAE for reinforcement learning is at its very early stage [7], in which all actions, rewards and observations are jointly considered to infer a latent state, which is designed to encode the generation process of both the policy and the value function. In this PhD we would to investigate the use of VAE-based RL for audio-visual human-robot interaction. We aim to combine VAE for RL with probabilistic models developed for other tasks, such as speaker tracking or speech enhancement. VAE is a prominent research line but we are open to other ideas. Our team has expertise in probabilistic models for a variety of tasks [8,9] as well as on reinforcement learning for HRI [10].<br><br>Our team is searching for a motivated PhD candidate to investigate new approaches for deep reinforcement learning (DRL) in the field of audio-visual human-robot interaction. Although its high potential, DRL is still in its infancies when it comes to real-world applications such as robotics. Our group investigates DRL methods for the control of robots based on audio-visual inputs [10]. Our current project, in cooperation with several European partners [11], develops new approaches to enable health-care robots to interact and communicate with groups of people by providing information or guiding them. In difference to existing methods we are looking into approaches that combine visual and auditory information which improve for example the identification of an active speaker or their location.<br><br>The PhD work will take place at Inria Grenoble, in Montbonnot-Saint-Martin, in the Perception Team, headed by Radu Horaud. It will be supervised by Chris Reinke (Post-doctoral Researcher) & Xavier Alameda-Pineda (Inria Research Scientist).<br><br><br>Skills:<br><br>Research Master's degree, or equivalent, in a discipline connected to signal and information processing, computer vision and machine learning. The candidate should be willing to study independently new approaches and to develop their own ideas in this field, getting inspiration from the previous description and the progress on the literature. The candidate should have preferably a background in artificial intelligence, machine learning, computer science or applied mathematics. Moreover, a candidate should have knowledge in programming, preferable in Python.<br><br><br>Remuneration:<br><br> - 1st and 2nd year: 1982 euros brut /month<br> - 3rd year: 2085 euros brut / month<br><br> <br>Benefits package:<br><br> - Subsidized meals<br> - Partial reimbursement of public transport costs<br> - Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)<br> - Possibility of teleworking (after 6 months of employment) and flexible organization of working hours<br> - Professional equipment available (videoconferencing, loan of computer equipment, etc.)<br> - Social, cultural and sports events and activities<br> - Access to vocational training<br> - Social security coverage<br><br> <br>All the best,<br>Chris Reinke<br><br>--<br>Postdoctoral Researcher<br>Perception Unit<br>Inria Grenoble<br><a href="http://www.scirei.net">www.scirei.net</a><br> <br> <br> <br>References<br><br>[1] Ren, Liangliang, Jiwen Lu, Zifeng Wang, Qi Tian, and Jie Zhou. "Collaborative deep reinforcement learning for multi-object tracking." In Proceedings of the European Conference on Computer Vision (ECCV), pp. 586-602. 2018.<br>[2] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.<br>[3] Casas, Noe. "Deep deterministic policy gradient for urban traffic light control." arXiv preprint arXiv:1703.09035 (2017).<br>[4] Dayan, Peter, and Geoffrey E. Hinton. "Using expectation-maximization for reinforcement learning." Neural Computation 9, no. 2 (1997): 271-278.<br>[5] Vlassis, Nikos, Marc Toussaint, Georgios Kontes, and Savas Piperidis. "Learning model-free robot control by a Monte Carlo EM algorithm." Autonomous Robots 27, no. 2 (2009): 123-130.<br>[6] Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).<br>[7] Igl, Maximilian, Luisa Zintgraf, Tuan Anh Le, Frank Wood, and Shimon Whiteson. "Deep variational reinforcement learning for pomdps." arXiv preprint arXiv:1806.02426 (2018).<br>[8] Ban, Yutong, Xavier Alameda-Pineda, Laurent Girin, and Radu Horaud. "Variational bayesian inference for audio-visual tracking of multiple speakers." IEEE transactions on pattern analysis and machine intelligence (2019).<br>[9] Sadeghi, Mostafa, and Xavier Alameda-Pineda. "Robust unsupervised audio-visual speech enhancement using a mixture of variational autoencoders." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7534-7538. IEEE, 2020.<br>[10] Lathuilière, Stéphane, Benoît Massé, Pablo Mesejo, and Radu Horaud. "Neural network based reinforcement learning for audio–visual gaze control in human–robot interaction." Pattern Recognition Letters 118 (2019): 61-71.<br>[11] <a href="https://spring-h2020.eu">https://spring-h2020.eu</a></div>