<div dir="ltr">A gentle reminder that the talk will be tomorrow (Tuesday) noon in <b>NSH 1507</b>.</div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Feb 17, 2018 at 12:40 PM, Adams Wei Yu <span dir="ltr"><<a href="mailto:weiyu@cs.cmu.edu" target="_blank">weiyu@cs.cmu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">Dear faculty and students,</div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><br></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><span style="font-weight:400">We look forward to seeing you next Tuesday, Feb 20, at noon in<span> </span></span><b>NSH 1507</b><span> </span>for AI Seminar sponsored by Apple. To learn more about the seminar series, please visit the AI Seminar <a href="http://www.cs.cmu.edu/~aiseminar/" style="color:rgb(17,85,204);font-weight:400" target="_blank">webpage</a>.</div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><br></div><div style="text-align:start;text-indent:0px;text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-transform:none;white-space:normal;word-spacing:0px">On Tuesday,<span> </span></span><span style="font-size:12.8px"><a href="http://koppel.bitballoon.com/" target="_blank">Alec Koppel</a> will give the following talk: </span></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><br></div><div style="text-align:start;text-indent:0px;text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-transform:none;white-space:normal;word-spacing:0px">Title: </span><span style="font-size:12.8px">Nonparametric Stochastic Methods for Continuous Reinforcement Learning</span></div><div><span style="font-size:12.8px"><br></span></div><div><div><div><span style="font-size:12.8px">Abstract: </span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Reinforcement learning is a generic framework to describe an autonomous agent seeking to learn behavior sequentially in uncertain environments based on rewards. This framework has gained increasing relevance for autonomous control, management science, and econometrics. Unfortunately, heuristics or intractably complicated tools are still prevalent when state and action spaces are continuous. In this talk, we develop new algorithms for estimating the value function or action-value function in continuous Markov Decision Problems (MDPs). The core of these methods are nonparametric (kernelized) extensions of stochastic quasi-gradient methods operating in tandem with sparse subspace projections. The resulting tools yield the first convergence results for Value or Q-function estimation when these functions have an infinite nonlinear parameterization, addressing in the affirmative a long-standing open question posed by Tsistiklis and Van Roy (1997). We then demonstrate on the classic Mountain Car domain that we can obtain comparable performance to existing approaches to TD or Q learning with orders of magnitude fewer data samples and interpretable representations of the learned functions.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Biography: </span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Alec Koppel began as a Research Scientist at the U.S. Army Research Laboratory in the Computational and Information Sciences Directorate in September of 2017. He completed his master's degree in Statistics and doctorate in Electrical and Systems Engineering, both at the University of Pennsylvania (Penn) in August of 2017. He is also a participant in the Science, Mathematics, and Research for Transformation (SMART) Scholarship Program sponsored by the American Society of Engineering Education. Before coming to Penn, he completed his Master's degree in Systems Science and Mathematics and Bachelor's Degree in Mathematics, both at Washington University in St. Louis (WashU), Missouri. His research interests are in the areas of signal processing, optimization and learning theory. His current work focuses on optimization and learning methods for streaming data applications, with an emphasis on problems arising in autonomous systems. He co-authored a paper selected as a Best Paper Finalist at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers.</span></div></div></div></div></div>

</blockquote></div><br></div>