<div dir="ltr">Reminder: Keegan's talk is happening in about 30 minutes.</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Feb 13, 2025 at 3:42 PM Victor Akinwande <<a href="mailto:vakinwan@andrew.cmu.edu">vakinwan@andrew.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Dear all,<br><br>We look forward to seeing you next <b>Tuesday (02/18) from 12:00-1:00 PM (ET)</b> for the next talk of CMU <span>AI</span> <span>Seminar</span>, sponsored by <a href="https://sambanova.ai/" target="_blank">SambaNova Systems</a>. The <span>seminar</span> will be held in <b>GHC 6115</b> with pizza provided and will be streamed on Zoom.<br><br>To learn more about the <span>seminar</span> series or to see the future schedule, please visit the <span>seminar</span> website (<a href="http://www.cs.cmu.edu/~aiseminar/" target="_blank">http://www.cs.cmu.edu/~aiseminar/</a>).<div><br></div><div><span style="background-color:rgb(255,255,0)">Next Tuesday (02/18) Keegan Harris (CMU) will be giving a talk titled: "Should You Use Your Large Language Model to Explore or Exploit?".</span></div><div><br></div><br><div><b>Abstract</b></div><div><div>In-context (supervised) learning is the ability of an LLM to perform new prediction tasks by conditioning on examples provided in the prompt, without any updates to internal model parameters. Although supervised learning is an important capability, many applications demand the use of ML models for downstream decision making. Thus, in-context reinforcement learning (ICRL) is a natural next frontier. In this talk, we investigate the extent to which contemporary LLMs can solve ICRL tasks. We begin by deploying LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context. We experiment with several frontier models and find that they do not engage in robust decision making behavior without substantial task-specific mitigations. Motivated by this observation, we then use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current generation of LLMs often struggle to exploit, in-context mitigations may be used to improve performance on small-scale tasks. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore. This talk is based on joint work with Alex Slivkins, Akshay Krishnamurthy, Dylan Foster, and Cyril Zhang. </div><div><br></div></div><div><u style="font-weight:700"><br>Speaker bio: </u><b></b></div><div><span>Keegan</span> Harris is a final-year Machine Learning PhD candidate at CMU, where he is advised by Nina Balcan and Steven Wu, and does research on machine learning for decision making. He has been recognized as a Rising Star in Data Science and his research is supported by an NDSEG Fellowship. He is also the head editor of the ML@CMU blog. Previously, <span>Keegan</span> spent two summers as an intern at Microsoft Research and graduated from Penn State with BS degrees in Computer Science and Physics.</div><div><br></div><div><u style="font-weight:700"><br></u></div><div><div><b>In person: GHC 6115</b></div><div>Zoom Link:<b><font color="#0b5394"> <a href="https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1" target="_blank">https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1</a></font></b></div><div></div><div><br></div></div></div>
</blockquote></div>