Feb 04 at 12pm (GHC 6115) -- Yuchen Li (CMU) -- Towards Mathematical Understanding of Modern Language Models

Fri Jan 31 13:54:07 EST 2025

Dear all,

We look forward to seeing you next *Tuesday (02/04) from 12:00-1:00 PM
(ET)* for
the next talk of CMU AI Seminar, sponsored by SambaNova Systems
<https://sambanova.ai/>. The seminar will be held in *GHC 6115* with pizza
provided and will be streamed on Zoom.

To learn more about the seminar series or to see the future schedule,
please visit the seminar website (http://www.cs.cmu.edu/~aiseminar/).

Next Tuesday (02/04) Yuchen Li (CMU) will be giving a talk titled: "Towards
Mathematical Understanding of Modern Language Models".

*Abstract*
To mathematically reason about how neural networks learn languages, our
methodology involves three major components: (1) mathematically
characterizing key structures in language data distributions, (2)
theoretically proving how neural networks capture such structures through
self-supervision during pre-training, and (3) conducting controlled
experiments using synthetic data. In this talk, I will survey a few
applications of this methodology: understanding Transformers training
dynamics via the lens of topic models, and proving pitfalls in common
Transformer interpretability heuristics via the lens of a formal language
(the Dyck grammar). These results illustrate some promises and
challenges for this methodology. Finally, I will share some thoughts on key
open questions.
Paper links:
1. Yuchen Li, Yuanzhi Li, and Andrej Risteski. How Do Transformers Learn
Topic Structure: Towards a Mechanistic Understanding. ICML 2023.
https://arxiv.org/abs/2303.04245
2. Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski. Transformers are
uninterpretable with myopic methods: a case study with bounded Dyck
grammars. NeurIPS 2023. https://arxiv.org/abs/2312.01429

*Speaker bio: *
Yuchen Li (  https://www.cs.cmu.edu/~yuchenl4/ ) is a Ph.D. student in the
Machine Learning Department at Carnegie Mellon University, advised by
Professor Andrej Risteski. Yuchen's research interest is in improving the
mathematical understanding of language models (training dynamics, efficient
sampling, mechanistic interpretability).

*In person: GHC 6115*
Zoom Link:* https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1
<https://cmu.zoom.us/j/93599036899?pwd=oV45EL19Bp3I0PCRoM8afhKuQK7HHN.1>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20250131/2da438dd/attachment.html>