[AI Seminar] AI Seminar sponsored by Apple -- Wen Sun -- April 10
Adams Wei Yu
weiyu at cs.cmu.edu
Mon Apr 9 03:26:01 EDT 2018
A gentle reminder that the talk will be tomorrow (Tuesday) noon in NSH 3305.
On Sat, Apr 7, 2018 at 2:02 PM, Adams Wei Yu <weiyu at cs.cmu.edu> wrote:
> Dear faculty and students,
>
> We look forward to seeing you next Tuesday, April 10, at noon in NSH 3305 for
> AI Seminar sponsored by Apple. To learn more about the seminar series,
> please visit the AI Seminar webpage <http://www.cs.cmu.edu/~aiseminar/>.
>
> On Tuesday, Wen Sun <http://www.cs.cmu.edu/~wensun/> will give the
> following talk:
>
> Title: Efficient Reinforcement Learning via Imitation
>
> Abstract:
>
> A fundamental challenge in Artificial Intelligence (AI), robotics, and
> language processing is sequential prediction: to reason, plan, and make a
> sequence of predictions or decisions to minimize accumulated cost, achieve
> a long-term goal, or optimize for a loss acquired only after many
> predictions. Reinforcement Learning (RL), as a general framework for
> learning from experience to make predictions and decisions, is often
> considered as one of the perfect tools for solving such a challenge in AI.
> Recently, equipped with the advancement from Deep Learning literature, we
> have advanced the state-of-the-art of RL on a number of applications
> including simulated high-dimensional robotics control, video games, and
> board games (e.g., AlphaGo).
>
> Because of its generality—RL is a general framework that summarizes many
> special machine learning algorithms and applications—RL is hard. As there
> is no direct supervision, one central challenge in RL is how to explore an
> unknown environment and collect useful feedback efficiently. In recent RL
> success stories (e.g., super-human performance on video games [Mnih et al.,
> 2015]), we notice that most of them rely on random exploration strategies,
> which usually requires huge number of interactions with the environment
> before it can learn anything useful. Another challenge is credit
> assignment: if a learning agent successfully achieves some task after
> making a long sequence of decisions, how can we assign credit for the
> success among these decisions?
>
> We first attempt to gain purchase on RL problems by introducing an
> additional source of information—an expert who knows how to solve tasks
> (near) optimally. By imitating an expert, we can significantly reduce the
> burden of exploration (i.e., we imitate instead of randomly explore), and
> solve the credit assignment problem (i.e., the expert tells us which
> decisions are bad). We study in both theory and in practice how one can
> imitate experts to reduce sample complexity compared to a pure RL approach.
>
> As Imitation Learning is efficient, we next provide a general reduction
> from RL to Imitation Learning with a focus on applications where experts
> are not available. We explore the possibilities of learning local models
> and then using off-shelf model-based RL solvers to compute an intermediate
> “expert” for efficient policy improvement via imitation. Furthermore, we
> show a general convergence analysis that generalizes and provides the
> theoretical foundation for recent successful practical RL algorithms such
> as ExIt and AlphaGo Zero [Anthony et al., 2017, Silver et al., 2017], and
> provides a theoretical sound and practically efficient way of unifying
> model-based and model-free RL approaches.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20180409/be3667e8/attachment.html>
More information about the ai-seminar-announce
mailing list