<div dir="ltr">Dear all,<br><br>We look forward to seeing you next <b>Tuesday (02/18) from 12:00-1:00 PM (ET)</b> for the next talk of CMU <span class="gmail-il">AI</span> <span class="gmail-il">Seminar</span>, sponsored by <a href="https://sambanova.ai/" target="_blank">SambaNova Systems</a>. The <span class="gmail-il">seminar</span> will be held in <b>GHC 6115</b> with pizza provided and will be streamed on Zoom.<br><br>To learn more about the <span class="gmail-il">seminar</span> series or to see the future schedule, please visit the <span class="gmail-il">seminar</span> website (<a href="http://www.cs.cmu.edu/~aiseminar/" target="_blank">http://www.cs.cmu.edu/~aiseminar/</a>).<div><br></div><div><span style="background-color:rgb(255,255,0)">Next Tuesday (02/18) Keegan Harris (CMU) will be giving a talk titled: "Should You Use Your Large Language Model to Explore or Exploit?".</span></div><div><br></div><br class="gmail-Apple-interchange-newline"><div><b>Abstract</b></div><div><div>In-context (supervised) learning is the ability of an LLM to perform new prediction tasks by conditioning on examples provided in the prompt, without any updates to internal model parameters. Although supervised learning is an important capability, many applications demand the use of ML models for downstream decision making. Thus, in-context reinforcement learning (ICRL) is a natural next frontier. In this talk, we investigate the extent to which contemporary LLMs can solve ICRL tasks. We begin by deploying LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context. We experiment with several frontier models and find that they do not engage in robust decision making behavior without substantial task-specific mitigations. Motivated by this observation, we then use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current generation of LLMs often struggle to exploit, in-context mitigations may be used to improve performance on small-scale tasks. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore. This talk is based on joint work with Alex Slivkins, Akshay Krishnamurthy, Dylan Foster, and Cyril Zhang. </div><div><br></div></div><div><u style="font-weight:700"><br class="gmail-Apple-interchange-newline">Speaker bio: </u><b></b></div><div><span class="gmail-il">Keegan</span> Harris is a final-year Machine Learning PhD candidate at CMU, where he is advised by Nina Balcan and Steven Wu, and does research on machine learning for decision making. He has been recognized as a Rising Star in Data Science and his research is supported by an NDSEG Fellowship. He is also the head editor of the ML@CMU blog. Previously, <span class="gmail-il">Keegan</span> spent two summers as an intern at Microsoft Research and graduated from Penn State with BS degrees in Computer Science and Physics.</div><div><br></div><div><u style="font-weight:700"><br></u></div><div><div><b>In person: GHC 6115</b></div><div>Zoom Link:<b><font color="#0b5394"> <a href="https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1" target="_blank">https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1</a></font></b></div><div class="gmail-yj6qo"></div><div class="gmail-adL"><br></div></div></div>