<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div>Please find below an offer for an intership from February/March to July/August 2024 in Mnemosyne team at Inria, Bordeaux, France</div><div><br></div><div># Title</div><div>Title: Evolving Reservoirs for Meta Reinforcement Learning</div><div><br></div><div># Supervision</div><div>Xavier Hinaut, Mnemosyne team, Inria Bordeaux</div><div>Potential collaboration with Clément Moulin-Frier (Flowers Team) & Pierrick Legrand (Astral Team)</div><div><br></div><div># Duration:</div><div>6 months, spring/summer 2024. Level: Master 2 research internship</div><div><br></div><div># Keywords:</div><div>meta reinforcement learning, reservoir computing, evolutionary computation, neuroevolution, simulation environments, scientific programming with Python.</div><div><br></div><div># Research team & internship location</div><div>Mnémosyne Team: Inria Bordeaux Sud-Ouest, LABRI & Institut des Maladies Neurodégénératives (Centre Broca Aquitaine, Carreire campus)</div><div>https://team.inria.fr/mnemosyne</div><div><br></div><div># Project</div><div>How can a reinforcement learning (RL) agent autonomously learn a diversity of skills with minimal external supervision? Recent works have shown that a single goal-conditioned policy can be trained to solve multiple tasks in parallel (see e.g. Colas et al., 2021 for a recent review). Other works in Meta Reinforcement Learning (Meta-RL) have shown that artificial agents can “learn how to learn” (hence the term meta-learning) multiple skills without having access to goal-related information (see e.g. Weng, 2019 for an introduction).</div><div><br></div><div>Meta-RL aims at equipping agents with the ability to generalize to tasks or environments that have not been encountered during training. Two nested processes of adaptation are traditionally considered: an outer adaptation loop optimizing the hyperparameters of an inner adaptation loop. The inner adaptation loop can be a standard RL algorithm operating on a given environment. The outer loop is tuning the hyperparameters of the inner loop such that it performs well on a wide distribution of environments. The end result of this nested training process is an algorithm that learns how to learn, i.e. learns how to adapt to tasks that have never been encountered during training. Works differ in the optimization technique and adaptation mechanism they consider in the two loops. The outer loop often optimizes higher-level structures such as the architecture of a neural network (Baker et al., 2017), the morphology (Gupta et al., 2021), a curiosity module providing intrinsic rewards (Alet et al., 2020) or plasticity rules (Najarro & Risi, 2021), employing either evolutionary or gradient-based optimization.</div><div><br></div><div>In parallel, Reservoir Computing (RC, (Jaeger 2001+)) is particularly well suited for extracting information from data stream with complex temporal dynamics (e.g. weather forecast or language). It is a machine learning paradigm on sequential data where a recurrent neural network is only partially trained (e.g. only training a linear readout).</div><div><br></div><div><br></div><div>One of the major interests of these recurrent neural networks is their reduced computational cost and the possibility to learn both in on-line and off-line fashion. They have been successfully applied on a wide variety of tasks, from prediction/generation of chaotic timeseries to discrimination of audio sequences, such as bird song recognition. They offer de facto a fast, simple yet efficient way to train RNNs. This "reservoir of computations" works thanks to random projections in large dimensions, and is thus similar to temporal Support Vector Machines (SVM).</div><div><br></div><div>In this internship, we will explore how Reservoir Computing can be leveraged in the context of Meta-RL. In this context, we will consider evolving the architecture of a reservoir for performing well on a wide range of RL tasks (see Seoane, 2019 for theoretical arguments on this evolutionary perspective). For this aim, we will use the ReservoirPy Python library developed in the Mnemosyne team (Trouvain & Hinaut, 2022) that we will interface with evolutionary computation algorithms (e.g. Tang et al., 2022). The project will involve:</div><div><br></div><div>●<span class="Apple-tab-span" style="white-space:pre"> </span>Designing of a wide range of RL tasks in an existing simulation environment (ex MuJoCo)</div><div>●<span class="Apple-tab-span" style="white-space:pre"> </span>Learning how to learn multiple tasks without access to the goal information (i.e. meta-RL with a reservoir).</div><div>●<span class="Apple-tab-span" style="white-space:pre"> </span>Evolve reservoir architectures for meta-RL with evolutionary methods</div><div><br></div><div><br></div><div># How to apply</div><div>Contact xavier.hinaut@inria.fr with a CV and letter of motivation. We also recommend sending documents or reports describing previous projects you have been working on (even if they are not directly related to the topic), as well as your grades and links to some of your code repositories.</div><div><br></div><div># Requirements</div><div>We are looking for highly motivated MSc students (Master II). Programming skills and prior experience with Python and deep learning frameworks (Pytorch, Tensorflow) are expected.</div><div><br></div><div># Other internship offers</div><div><a href="https://github.com/neuronalX/internships/">https://github.com/neuronalX/internships/</a></div><div><br></div><div># References</div><div>Alet, F., Schneider, M. F., Lozano-Perez, T., & Kaelbling, L. P. (2020). Meta-learning curiosity algorithms. International Conference on Learning Representations (ICLR 2020). https://openreview.net/forum?id=BygdyxHFDS</div><div>Baker, B., Gupta, O., Naik, N., & Raskar, R. (2017). Designing neural network architectures using reinforcement learning. ArXiv:1611.02167 [Cs]. http://arxiv.org/abs/1611.02167</div><div>Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., & Hassabis, D. (2019). Reinforcement Learning, Fast and Slow. Trends in Cognitive Sciences, 23(5), 408–422. https://doi.org/10.1016/j.tics.2019.02.006</div><div>Colas, C., Karch, T., Sigaud, O., & Oudeyer, P.-Y. (2021). Intrinsically Motivated Goal-Conditioned Reinforcement Learning: A Short Survey. ArXiv:2012.09830 [Cs]. http://arxiv.org/abs/2012.09830</div><div>Gupta, A., Savarese, S., Ganguli, S., & Fei-Fei, L. (2021). Embodied intelligence via learning and evolution. Nature Communications, 12(1), 5721. https://doi.org/10.1038/s41467-021-25874-z</div><div>Jaeger, H. (2007). Echo state network. Scholarpedia, 2(9), 2330. https://doi.org/10.4249/scholarpedia.2330</div><div>Najarro, E., & Risi, S. (2021). Meta-Learning through Hebbian Plasticity in Random Networks. ArXiv:2007.02686 [Cs]. http://arxiv.org/abs/2007.02686</div><div>Sakemi, Y., Morino, K., Leleu, T., & Aihara, K. (2020). Model-size reduction for reservoir computing by concatenating internal states through time. Scientific Reports, 10(1), Article 1. https://doi.org/10.1038/s41598-020-78725-0</div><div>Seoane, L. F. (2019). Evolutionary aspects of reservoir computing. Philosophical Transactions of the Royal Society B: Biological Sciences, 374(1774), 20180377. https://doi.org/10.1098/rstb.2018.0377</div><div>Tang, Y., Tian, Y., & Ha, D. (2022). EvoJAX: Hardware-Accelerated Neuroevolution. Proceedings of the Genetic and Evolutionary Computation Conference Companion, 308–311. https://doi.org/10.1145/3520304.3528770</div><div>Trouvain, N., & Hinaut, X. (2022). reservoirpy: A Simple and Flexible Reservoir Computing Tool in Python. https://hal.inria.fr/hal-03699931 https://github.com/reservoirpy/reservoirpy</div><div>Weng, L. (2019, June 23). Meta Reinforcement Learning. https://lilianweng.github.io/posts/2019-06-23-meta-rl/</div><div><br></div><div><br></div><div>
<meta charset="UTF-8"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div>Xavier Hinaut<br>Inria Research Scientist<br>www.xavierhinaut.com -- +33 5 33 51 48 01<br>Mnemosyne team, Inria, Bordeaux, France -- https://team.inria.fr/mnemosyne<br>& LaBRI, Bordeaux University -- https://www4.labri.fr/en/formal-methods-and-models<br>& IMN (Neurodegeneratives Diseases Institute) -- http://www.imn-bordeaux.org/en<br>---<br>Our Reservoir Computing library: https://github.com/reservoirpy/reservoirpy</div></div>
</div>
<br></body></html>