Feb 04 at 12pm (GHC 6115) -- Yuchen Li (CMU) -- Towards Mathematical Understanding of Modern Language Models

Tue Feb 4 11:03:09 EST 2025

Reminder: Yuchen's talk is happening in less than an hour.

On Fri, Jan 31, 2025 at 1:54 PM Victor Akinwande <vakinwan at andrew.cmu.edu>
wrote:

> Dear all,
>
> We look forward to seeing you next *Tuesday (02/04) from 12:00-1:00 PM
> (ET)* for the next talk of CMU AI Seminar, sponsored by SambaNova Systems
> <https://sambanova.ai/>. The seminar will be held in *GHC 6115* with
> pizza provided and will be streamed on Zoom.
>
> To learn more about the seminar series or to see the future schedule,
> please visit the seminar website (http://www.cs.cmu.edu/~aiseminar/).
>
> Next Tuesday (02/04) Yuchen Li (CMU) will be giving a talk titled:
> "Towards Mathematical Understanding of Modern Language Models".
>
> *Abstract*
> To mathematically reason about how neural networks learn languages, our
> methodology involves three major components: (1) mathematically
> characterizing key structures in language data distributions, (2)
> theoretically proving how neural networks capture such structures through
> self-supervision during pre-training, and (3) conducting controlled
> experiments using synthetic data. In this talk, I will survey a few
> applications of this methodology: understanding Transformers training
> dynamics via the lens of topic models, and proving pitfalls in common
> Transformer interpretability heuristics via the lens of a formal language
> (the Dyck grammar). These results illustrate some promises and
> challenges for this methodology. Finally, I will share some thoughts on key
> open questions.
> Paper links:
> 1. Yuchen Li, Yuanzhi Li, and Andrej Risteski. How Do Transformers Learn
> Topic Structure: Towards a Mechanistic Understanding. ICML 2023.
> https://arxiv.org/abs/2303.04245
> 2. Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski. Transformers are
> uninterpretable with myopic methods: a case study with bounded Dyck
> grammars. NeurIPS 2023. https://arxiv.org/abs/2312.01429
>
>
> *Speaker bio: *
> Yuchen Li (  https://www.cs.cmu.edu/~yuchenl4/ ) is a Ph.D. student in
> the Machine Learning Department at Carnegie Mellon University, advised by
> Professor Andrej Risteski. Yuchen's research interest is in improving the
> mathematical understanding of language models (training dynamics, efficient
> sampling, mechanistic interpretability).
>
> *In person: GHC 6115*
> Zoom Link:* https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1
> <https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1>*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20250204/b8745cf9/attachment.html>