From vakinwan at andrew.cmu.edu Fri Jan 31 13:54:07 2025 From: vakinwan at andrew.cmu.edu (Victor Akinwande) Date: Fri, 31 Jan 2025 13:54:07 -0500 Subject: Feb 04 at 12pm (GHC 6115) -- Yuchen Li (CMU) -- Towards Mathematical Understanding of Modern Language Models Message-ID: Dear all, We look forward to seeing you next *Tuesday (02/04) from 12:00-1:00 PM (ET)* for the next talk of CMU AI Seminar, sponsored by SambaNova Systems . The seminar will be held in *GHC 6115* with pizza provided and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website (http://www.cs.cmu.edu/~aiseminar/). Next Tuesday (02/04) Yuchen Li (CMU) will be giving a talk titled: "Towards Mathematical Understanding of Modern Language Models". *Abstract* To mathematically reason about how neural networks learn languages, our methodology involves three major components: (1) mathematically characterizing key structures in language data distributions, (2) theoretically proving how neural networks capture such structures through self-supervision during pre-training, and (3) conducting controlled experiments using synthetic data. In this talk, I will survey a few applications of this methodology: understanding Transformers training dynamics via the lens of topic models, and proving pitfalls in common Transformer interpretability heuristics via the lens of a formal language (the Dyck grammar). These results illustrate some promises and challenges for this methodology. Finally, I will share some thoughts on key open questions. Paper links: 1. Yuchen Li, Yuanzhi Li, and Andrej Risteski. How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding. ICML 2023. https://arxiv.org/abs/2303.04245 2. Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski. Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars. NeurIPS 2023. https://arxiv.org/abs/2312.01429 *Speaker bio: * Yuchen Li ( https://www.cs.cmu.edu/~yuchenl4/ ) is a Ph.D. student in the Machine Learning Department at Carnegie Mellon University, advised by Professor Andrej Risteski. Yuchen's research interest is in improving the mathematical understanding of language models (training dynamics, efficient sampling, mechanistic interpretability). *In person: GHC 6115* Zoom Link:* https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1 * -------------- next part -------------- An HTML attachment was scrubbed... URL: From vakinwan at andrew.cmu.edu Tue Feb 4 11:03:09 2025 From: vakinwan at andrew.cmu.edu (Victor Akinwande) Date: Tue, 4 Feb 2025 11:03:09 -0500 Subject: Feb 04 at 12pm (GHC 6115) -- Yuchen Li (CMU) -- Towards Mathematical Understanding of Modern Language Models In-Reply-To: References: Message-ID: Reminder: Yuchen's talk is happening in less than an hour. On Fri, Jan 31, 2025 at 1:54?PM Victor Akinwande wrote: > Dear all, > > We look forward to seeing you next *Tuesday (02/04) from 12:00-1:00 PM > (ET)* for the next talk of CMU AI Seminar, sponsored by SambaNova Systems > . The seminar will be held in *GHC 6115* with > pizza provided and will be streamed on Zoom. > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website (http://www.cs.cmu.edu/~aiseminar/). > > Next Tuesday (02/04) Yuchen Li (CMU) will be giving a talk titled: > "Towards Mathematical Understanding of Modern Language Models". > > *Abstract* > To mathematically reason about how neural networks learn languages, our > methodology involves three major components: (1) mathematically > characterizing key structures in language data distributions, (2) > theoretically proving how neural networks capture such structures through > self-supervision during pre-training, and (3) conducting controlled > experiments using synthetic data. In this talk, I will survey a few > applications of this methodology: understanding Transformers training > dynamics via the lens of topic models, and proving pitfalls in common > Transformer interpretability heuristics via the lens of a formal language > (the Dyck grammar). These results illustrate some promises and > challenges for this methodology. Finally, I will share some thoughts on key > open questions. > Paper links: > 1. Yuchen Li, Yuanzhi Li, and Andrej Risteski. How Do Transformers Learn > Topic Structure: Towards a Mechanistic Understanding. ICML 2023. > https://arxiv.org/abs/2303.04245 > 2. Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski. Transformers are > uninterpretable with myopic methods: a case study with bounded Dyck > grammars. NeurIPS 2023. https://arxiv.org/abs/2312.01429 > > > *Speaker bio: * > Yuchen Li ( https://www.cs.cmu.edu/~yuchenl4/ ) is a Ph.D. student in > the Machine Learning Department at Carnegie Mellon University, advised by > Professor Andrej Risteski. Yuchen's research interest is in improving the > mathematical understanding of language models (training dynamics, efficient > sampling, mechanistic interpretability). > > *In person: GHC 6115* > Zoom Link:* https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1 > * > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vakinwan at andrew.cmu.edu Thu Feb 13 15:42:32 2025 From: vakinwan at andrew.cmu.edu (Victor Akinwande) Date: Thu, 13 Feb 2025 15:42:32 -0500 Subject: Feb 18 at 12pm (GHC 6115) -- Keegan Harris (CMU) -- Should You Use Your Large Language Model to Explore or Exploit? Message-ID: Dear all, We look forward to seeing you next *Tuesday (02/18) from 12:00-1:00 PM (ET)* for the next talk of CMU AI Seminar, sponsored by SambaNova Systems . The seminar will be held in *GHC 6115* with pizza provided and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website (http://www.cs.cmu.edu/~aiseminar/). Next Tuesday (02/18) Keegan Harris (CMU) will be giving a talk titled: "Should You Use Your Large Language Model to Explore or Exploit?". *Abstract* In-context (supervised) learning is the ability of an LLM to perform new prediction tasks by conditioning on examples provided in the prompt, without any updates to internal model parameters. Although supervised learning is an important capability, many applications demand the use of ML models for downstream decision making. Thus, in-context reinforcement learning (ICRL) is a natural next frontier. In this talk, we investigate the extent to which contemporary LLMs can solve ICRL tasks. We begin by deploying LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context. We experiment with several frontier models and find that they do not engage in robust decision making behavior without substantial task-specific mitigations. Motivated by this observation, we then use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current generation of LLMs often struggle to exploit, in-context mitigations may be used to improve performance on small-scale tasks. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore. This talk is based on joint work with Alex Slivkins, Akshay Krishnamurthy, Dylan Foster, and Cyril Zhang. *Speaker bio: * Keegan Harris is a final-year Machine Learning PhD candidate at CMU, where he is advised by Nina Balcan and Steven Wu, and does research on machine learning for decision making. He has been recognized as a Rising Star in Data Science and his research is supported by an NDSEG Fellowship. He is also the head editor of the ML at CMU blog. Previously, Keegan spent two summers as an intern at Microsoft Research and graduated from Penn State with BS degrees in Computer Science and Physics. *In person: GHC 6115* Zoom Link:* https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1 * -------------- next part -------------- An HTML attachment was scrubbed... URL: From vakinwan at andrew.cmu.edu Tue Feb 18 11:30:28 2025 From: vakinwan at andrew.cmu.edu (Victor Akinwande) Date: Tue, 18 Feb 2025 11:30:28 -0500 Subject: Feb 18 at 12pm (GHC 6115) -- Keegan Harris (CMU) -- Should You Use Your Large Language Model to Explore or Exploit? In-Reply-To: References: Message-ID: Reminder: Keegan's talk is happening in about 30 minutes. On Thu, Feb 13, 2025 at 3:42?PM Victor Akinwande wrote: > Dear all, > > We look forward to seeing you next *Tuesday (02/18) from 12:00-1:00 PM > (ET)* for the next talk of CMU AI Seminar, sponsored by SambaNova Systems > . The seminar will be held in *GHC 6115* with > pizza provided and will be streamed on Zoom. > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website (http://www.cs.cmu.edu/~aiseminar/). > > Next Tuesday (02/18) Keegan Harris (CMU) will be giving a talk titled: > "Should You Use Your Large Language Model to Explore or Exploit?". > > > *Abstract* > In-context (supervised) learning is the ability of an LLM to perform new > prediction tasks by conditioning on examples provided in the prompt, > without any updates to internal model parameters. Although supervised > learning is an important capability, many applications demand the use of ML > models for downstream decision making. Thus, in-context reinforcement > learning (ICRL) is a natural next frontier. In this talk, we investigate > the extent to which contemporary LLMs can solve ICRL tasks. We begin by > deploying LLMs as agents in simple multi-armed bandit environments, > specifying the environment description and interaction history entirely > in-context. We experiment with several frontier models and find that they > do not engage in robust decision making behavior without substantial > task-specific mitigations. Motivated by this observation, we then use LLMs > to explore and exploit in silos in various (contextual) bandit tasks. We > find that while the current generation of LLMs often struggle to exploit, > in-context mitigations may be used to improve performance on small-scale > tasks. On the other hand, we find that LLMs do help at exploring large > action spaces with inherent semantics, by suggesting suitable candidates to > explore. This talk is based on joint work with Alex Slivkins, Akshay > Krishnamurthy, Dylan Foster, and Cyril Zhang. > > > *Speaker bio: * > Keegan Harris is a final-year Machine Learning PhD candidate at CMU, > where he is advised by Nina Balcan and Steven Wu, and does research on > machine learning for decision making. He has been recognized as a Rising > Star in Data Science and his research is supported by an NDSEG Fellowship. > He is also the head editor of the ML at CMU blog. Previously, Keegan spent > two summers as an intern at Microsoft Research and graduated from Penn > State with BS degrees in Computer Science and Physics. > > > *In person: GHC 6115* > Zoom Link:* https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1 > * > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vakinwan at andrew.cmu.edu Thu Feb 20 11:29:21 2025 From: vakinwan at andrew.cmu.edu (Victor Akinwande) Date: Thu, 20 Feb 2025 11:29:21 -0500 Subject: Feb 25 at 12pm (GHC 6115) -- Samy Bengio (Apple) -- How far can transformers reason? the globality barrier and inductive scratchpad Message-ID: Dear all, We look forward to seeing you next *Tuesday (02/25) from 12:00-1:00 PM (ET)* for the next talk of CMU AI Seminar, sponsored by SambaNova Systems . The seminar will be held in *GHC 6115* with pizza provided and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website (http://www.cs.cmu.edu/~aiseminar/). Next Tuesday (02/25) Samy Bengio (Apple) will be giving a talk titled: "How far can transformers reason? the globality barrier and inductive scratchpad". *Abstract:* Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This presentation puts forward the notion of 'globality degree' to capture when weak learning is efficiently achievable by regular Transformers, where the latter measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally and theoretically under additional assumptions, distributions with high globality cannot be learned efficiently. In particular, syllogisms cannot be composed on long chains. Furthermore, we show that (i) an agnostic scratchpad cannot help to break the globality barrier, (ii) an educated scratchpad can help if it breaks the globality barrier at each step, (iii) a notion of 'inductive scratchpad' can both break the globality barrier and improve the out-of-distribution generalization, e.g., generalizing to almost double input size for some arithmetic tasks. *Speaker Bio:* Samy Bengio (PhD in computer science, University of Montreal, 1993) is a senior director of machine learning research at Apple since 2021 and an adjunct professor at EPFL since 2024. Before that, he was a distinguished scientist at Google Research since 2007 where he was heading part of the Google Brain team, and at IDIAP in the early 2000s where he co-wrote the well-known open-source Torch machine learning library. His research interests span many areas of machine learning such as deep architectures, representation learning, vision and language processing and more recently, reasoning. He is action editor of the Journal of Machine Learning Research and on the board of the NeurIPS foundation. He was on the editorial board of the Machine Learning Journal, has been program chair (2017) and general chair (2018) of NeurIPS, program chair of ICLR (2015, 2016), general chair of BayLearn (2012-2015), MLMI (2004-2006), as well as NNSP (2002), and on the program committee of several international conferences such as NeurIPS, ICML, ICLR, ECML and IJCAI. More details can be found at http://bengio.abracadoudou.com. *In person: GHC 6115* Zoom Link:* https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1 * -------------- next part -------------- An HTML attachment was scrubbed... URL: