Please come to Ian Char's PhD thesis defense starting at 10am in GHC 6115!

Thu Apr 11 09:19:05 EDT 2024

-------- Forwarded Message --------
Subject: 	Thesis Defense - April 11, 2024 - Ian Char - Advancing 
Model-Based Reinforcement Learning with Applications in Nuclear Fusion
Date: 	Thu, 28 Mar 2024 14:16:52 -0400
From: 	Diane Stidle <stidle at andrew.cmu.edu>
Reply-To: 	stidle at andrew.cmu.edu
To: 	ml-seminar at cs.cmu.edu <ML-SEMINAR at CS.CMU.EDU>, 
riedmiller at google.com, ekolemen at pppl.gov

*/Thesis Defense/*

Date: April 11, 2024
Time: 10:00am (EDT)
Place: GHC 6115 & Remote
PhD Candidate: Ian Char

*Title: Advancing Model-Based Reinforcement Learning with Applications 
in Nuclear Fusion*

Abstract:

Reinforcement learning (RL) may be the key to overcoming previous 
insurmountable obstacles, leading to technological and scientific 
innovations. One such example where RL could have a sizable impact is in 
tokamak control. Tokamaks are one of the most promising devices for 
making nuclear fusion into a viable energy source. They operate by 
magnetically confining a plasma; however, sustaining the plasma for long 
periods of time and at high pressures remains a challenge for the 
tokamak control community. RL may be able to learn how to sustain the 
plasma, but like many exciting applications of RL, it is infeasible to 
collect data on the real device in order to learn a policy.

In this thesis, we explore learning policies using surrogate models of 
the environment, and especially using surrogate models that are learned 
from an offline data source. To start in Part I, we investigate the 
scenario in which one has access to a simulator that can be used to 
generate data, but the simulator is too computationally taxing to use 
data-hungry deep RL algorithms. We instead suggest a Bayesian 
optimization algorithm to learn such a policy. Following this, we pivot 
to the setting in which surrogate models of the environment can be 
learned with offline data. While these models are much more 
computationally cheap, their predictions inevitably contain errors. As 
such, both robust policy learning procedures and good uncertainty 
quantification of model errors are crucial for success. To address the 
former, in Part II we propose a trajectory stitching algorithm that 
accounts for these modeling errors and a policy network architecture 
that is adaptive, yet robust. Part III shifts focus onto uncertainty 
quantification, where we propose a more intelligent uncertainty sampling 
procedure and a neural process architecture for learning uncertainties 
efficiently. In the final part, we detail how we learned models to 
predict plasma evolution, how we used these models to train a neutral 
beam controller, and the results of deploying this controller on the 
DIII-D tokamak.

*Thesis Committee:*
Jeff Schneider, Chair
Ruslan Salakhutdinov
Zico Kolter
Martin Riedmiller (DeepMind)
Egemen Kolemen (Princeton)

Link to Draft Document: 
https://drive.google.com/file/d/1VQAZDuvRA1GfEfZkGS6EfzFd-zovetU1/view?usp=sharing

Link to Zoom meeting:
https://www.google.com/url?q=https://cmu.zoom.us/j/94461753500?pwd%3DN1FmTktDWWU5cDkwM0szWWxvSXNndz09&sa=D&source=calendar&ust=1712067446243633&usg=AOvVaw0pAS1H8u4VyGICh2A69iS2 
<https://www.google.com/url?q=https://cmu.zoom.us/j/94461753500?pwd%3DN1FmTktDWWU5cDkwM0szWWxvSXNndz09&sa=D&source=calendar&ust=1712067446243633&usg=AOvVaw0pAS1H8u4VyGICh2A69iS2>

-- 
Diane Stidle
PhD Program Manager
Machine Learning Department
Carnegie Mellon University
stidle at andrew.cmu.edu