[AI Seminar] AI Seminar sponsored by Apple -- Wen Sun -- April 10

Sat Apr 7 17:02:23 EDT 2018

Dear faculty and students,

We look forward to seeing you next Tuesday, April 10, at noon in NSH 3305 for
AI Seminar sponsored by Apple. To learn more about the seminar series,
please visit the AI Seminar webpage <http://www.cs.cmu.edu/~aiseminar/>.

On Tuesday,  Wen Sun <http://www.cs.cmu.edu/~wensun/> will give the
following talk:

Title:  Efficient Reinforcement Learning via Imitation

Abstract:

A fundamental challenge in Artificial Intelligence (AI), robotics, and
language processing is sequential prediction: to reason, plan, and make a
sequence of predictions or decisions to minimize accumulated cost, achieve
a long-term goal, or optimize for a loss acquired only after many
predictions. Reinforcement Learning (RL), as a general framework for
learning from experience to make predictions and decisions, is often
considered as one of the perfect tools for solving such a challenge in AI.
Recently, equipped with the advancement from Deep Learning literature, we
have advanced the state-of-the-art of RL on a number of applications
including simulated high-dimensional robotics control, video games, and
board games (e.g., AlphaGo).

Because of its generality—RL is a general framework that summarizes many
special machine learning algorithms and applications—RL is hard. As there
is no direct supervision, one central challenge in RL is how to explore an
unknown environment and collect useful feedback efficiently. In recent RL
success stories (e.g., super-human performance on video games [Mnih et al.,
2015]), we notice that most of them rely on random exploration strategies,
which usually requires huge number of interactions with the environment
before it can learn anything useful. Another challenge is credit
assignment: if a learning agent successfully achieves some task after
making a long sequence of decisions, how can we assign credit for the
success among these decisions?

We first attempt to gain purchase on RL problems by introducing an
additional source of information—an expert who knows how to solve tasks
(near) optimally. By imitating an expert, we can significantly reduce the
burden of exploration (i.e., we imitate instead of randomly explore), and
solve the credit assignment problem (i.e., the expert tells us which
decisions are bad). We study in both theory and in practice how one can
imitate experts to reduce sample complexity compared to a pure RL approach.

As Imitation Learning is efficient, we next provide a general reduction
from RL to Imitation Learning with a focus on applications where experts
are not available. We explore the possibilities of learning local models
and then using off-shelf model-based RL solvers to compute an intermediate
“expert” for efficient policy improvement via imitation. Furthermore, we
show a general convergence analysis that generalizes and provides the
theoretical foundation for recent successful practical RL algorithms such
as ExIt and AlphaGo Zero [Anthony et al., 2017, Silver et al., 2017], and
provides a theoretical sound and practically efficient way of unifying
model-based and model-free RL approaches.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20180407/5d6b1843/attachment.html>