Connectionists: Open PhD Position in Deep Probabilistic Reinforcement Learning for Audio-Visual Human-Robot Interaction at INRIA Grenoble

Chris Reinke c.reinke85 at gmail.com
Sat Jun 13 08:33:42 EDT 2020


Open PhD Position in Deep Probabilistic Reinforcement Learning for
Audio-Visual Human-Robot Interaction at INRIA Grenoble

More information and application procedure:
https://jobs.inria.fr/public/classic/en/offres/2020-02718

Starting date: 2020-10-01
Duration of contract: 3 years
Deadline to apply: 2020-07-08


Description:

Reinforcement learning, and in particular deep reinforcement learning
(DRL), became very popular in the recent, successfully addressing a wide
variety of tasks such as board game playing. It has also been popular in
addressing computer vision or pattern recognition tasks for which a
differentiable loss function is difficult to find or does not exist [1].
The most popular methodology in DRL is to approximate the so-called
action-value function, leading to deep Q networks (DQN) [2] and its
derivatives. In these methods, both the action policy and the system’s
transition function are implicitly learned within the neural network that
approximates the action-value. This is limiting since it is unclear how to
incorporate prior knowledge of the policy or of the system. Alternatives to
the mainstream methodology exists and they can be based, for instance, in
parametrising a deterministic policy, and optimizing the parametrisation by
gradient descent [3]. Probabilistic alternatives based on the EM algorithm
were first proposed in the late 90’s [4], and revisited in the 2010’s, see
for instance [5].

In parallel, progress on deep learning lead to the conception of
variational auto-encoders [6], which are non-linear probabilistic
generative models. The use of VAE for reinforcement learning is at its very
early stage [7], in which all actions, rewards and observations are jointly
considered to infer a latent state, which is designed to encode the
generation process of both the policy and the value function. In this PhD
we would to investigate the use of VAE-based RL for audio-visual
human-robot interaction. We aim to combine VAE for RL with probabilistic
models developed for other tasks, such as speaker tracking or speech
enhancement. VAE is a prominent research line but we are open to other
ideas. Our team has expertise in probabilistic models for a variety of
tasks [8,9] as well as on reinforcement learning for HRI [10].

Our team is searching for a motivated PhD candidate to investigate new
approaches for deep reinforcement learning (DRL) in the field of
audio-visual human-robot interaction. Although its high potential, DRL is
still in its infancies when it comes to real-world applications such as
robotics. Our group investigates DRL methods for the control of robots
based on audio-visual inputs [10]. Our current project, in cooperation with
several European partners [11], develops new approaches to enable
health-care robots to interact and communicate with groups of people by
providing information or guiding them. In difference to existing methods we
are looking into approaches that combine visual and auditory information
which improve for example the identification of an active speaker or their
location.

The PhD work will take place at Inria Grenoble, in Montbonnot-Saint-Martin,
in the Perception Team, headed by Radu Horaud. It will be supervised by
Chris Reinke (Post-doctoral Researcher) & Xavier Alameda-Pineda (Inria
Research Scientist).


Skills:

Research Master's degree, or equivalent, in a discipline connected to
signal and information processing, computer vision and machine learning.
The candidate should be willing to study independently new approaches and
to develop their own ideas in this field, getting inspiration from the
previous description and the progress on the literature. The candidate
should have preferably a background in artificial intelligence, machine
learning, computer science or applied mathematics. Moreover, a candidate
should have knowledge in programming, preferable in Python.


Remuneration:

 - 1st and 2nd year: 1982 euros brut /month
 - 3rd year: 2085 euros brut / month


Benefits package:

 - Subsidized meals
 - Partial reimbursement of public transport costs
 - Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory
reduction in working hours) + possibility of exceptional leave (sick
children, moving home, etc.)
 - Possibility of teleworking (after 6 months of employment) and flexible
organization of working hours
 - Professional equipment available (videoconferencing, loan of computer
equipment, etc.)
 - Social, cultural and sports events and activities
 - Access to vocational training
 - Social security coverage


All the best,
Chris Reinke

--
Postdoctoral Researcher
Perception Unit
Inria Grenoble
www.scirei.net



References

[1] Ren, Liangliang, Jiwen Lu, Zifeng Wang, Qi Tian, and Jie Zhou.
"Collaborative deep reinforcement learning for multi-object tracking." In
Proceedings of the European Conference on Computer Vision (ECCV), pp.
586-602. 2018.
[2] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel
Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through
deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.
[3] Casas, Noe. "Deep deterministic policy gradient for urban traffic light
control." arXiv preprint arXiv:1703.09035 (2017).
[4] Dayan, Peter, and Geoffrey E. Hinton. "Using expectation-maximization
for reinforcement learning." Neural Computation 9, no. 2 (1997): 271-278.
[5] Vlassis, Nikos, Marc Toussaint, Georgios Kontes, and Savas Piperidis.
"Learning model-free robot control by a Monte Carlo EM algorithm."
Autonomous Robots 27, no. 2 (2009): 123-130.
[6] Kingma, Diederik P., and Max Welling. "Auto-encoding variational
bayes." arXiv preprint arXiv:1312.6114 (2013).
[7] Igl, Maximilian, Luisa Zintgraf, Tuan Anh Le, Frank Wood, and Shimon
Whiteson. "Deep variational reinforcement learning for pomdps." arXiv
preprint arXiv:1806.02426 (2018).
[8] Ban, Yutong, Xavier Alameda-Pineda, Laurent Girin, and Radu Horaud.
"Variational bayesian inference for audio-visual tracking of multiple
speakers." IEEE transactions on pattern analysis and machine intelligence
(2019).
[9] Sadeghi, Mostafa, and Xavier Alameda-Pineda. "Robust unsupervised
audio-visual speech enhancement using a mixture of variational
autoencoders." In ICASSP 2020-2020 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 7534-7538. IEEE, 2020.
[10] Lathuilière, Stéphane, Benoît Massé, Pablo Mesejo, and Radu Horaud.
"Neural network based reinforcement learning for audio–visual gaze control
in human–robot interaction." Pattern Recognition Letters 118 (2019): 61-71.
[11] https://spring-h2020.eu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20200613/778bea0e/attachment.html>


More information about the Connectionists mailing list