[AI Seminar] Online AI Seminar on May 26 (Zoom) -- Thodoris Lykouris -- Corruption robust exploration in episodic reinforcement learning. AI seminar is sponsored by Fortive.
Aayush Bansal
aayushb at cs.cmu.edu
Mon May 25 20:54:45 EDT 2020
Reminder.. this is tomorrow at noon.
It is the last seminar of this semester. Happy Summer!
On Wed, May 20, 2020 at 5:07 PM Aayush Bansal <aayushb at cs.cmu.edu> wrote:
> Thodoris Lykouris (Microsoft Research, NYC) will be giving an online
> seminar on "Corruption robust exploration in episodic reinforcement
> learning" from *12:00 - 01:00 PM* on May 26.
>
> Zoom Link: *https://cmu.zoom.us/j/262225154
> <https://cmu.zoom.us/j/262225154>*
>
> CMU AI Seminar is sponsored by Fortive.
>
> Following are the details of the talk:
>
> *Title: *Corruption robust exploration in episodic reinforcement learning
>
> *Abstract: *We initiate the study of multi-stage episodic reinforcement
> learning under adversarial corruptions in both the rewards and the
> transition probabilities of the underlying system extending recent results
> for the special case of stochastic bandits. We provide a framework which
> modifies the aggressive exploration enjoyed by existing reinforcement
> learning approaches based on "optimism in the face of uncertainty", by
> complementing them with principles from "action elimination". Importantly,
> our framework circumvents the major challenges posed by naively applying
> action elimination in the RL setting, as formalized by a lower bound we
> demonstrate. Our framework yields efficient algorithms which (a) attain
> near-optimal regret in the absence of corruptions and (b) adapt to unknown
> levels corruption, enjoying regret guarantees which degrade gracefully in
> the total corruption encountered. To showcase the generality of our
> approach, we derive results for both tabular settings (where states and
> actions are finite) as well as linear-function-approximation settings
> (where the dynamics and rewards admit a linear underlying representation).
> Notably, our work provides the first sublinear regret guarantee which
> accommodates any deviation from purely i.i.d. transitions in the
> bandit-feedback model for episodic reinforcement learning.
>
> *Bio*: Thodoris Lykouris is a postdoctoral researcher in the machine
> learning group of Microsoft Research NYC. His research focus is on online
> decision-making spanning across the disciplines of machine learning,
> theoretical computer science, operations research, and economics. He
> completed his Ph.D. in 2019 from Cornell University where he was advised by
> Eva Tardos. During his Ph.D. years, his research has been generously
> supported by a Google Ph.D. Fellowship and a Cornell University Fellowship.
> He was also a finalist in the INFORMS Nicholson and Applied Probability
> Society best student paper competitions.
>
>
> To learn more about the seminar series, please visit the website:
> http://www.cs.cmu.edu/~aiseminar/
>
>
> --
> Aayush Bansal
> http://www.cs.cmu.edu/~aayushb/
>
>
--
Aayush Bansal
http://www.cs.cmu.edu/~aayushb/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20200525/1753ae92/attachment.html>
More information about the ai-seminar-announce
mailing list