[CMU AI Seminar] Special! 🧐 March 25 at 3pm (GHC 6115 & Zoom) -- Sadhika Malladi (Princeton) -- Theory and Practice in Language Model Fine-Tuning -- AI Seminar sponsored by SambaNova Systems

Asher Trockman ashert at cs.cmu.edu
Thu Mar 21 14:02:43 EDT 2024


Dear all,

We look forward to seeing you *this Monday (3/25)* from *3**:00-4:00 PM
(U.S. Eastern time)* for a special installment of this semester's
*CMU AI Seminar*, sponsored by SambaNova Systems <https://sambanova.ai/>.
The seminar will be held in GHC 6115 *with pizza provided *and will be
streamed on Zoom.

To learn more about the seminar series or to see the future schedule,
please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.

On this Monday (3/25), *Sadhika Malladi* (Princeton) will be giving a talk
titled *"**Theory and Practice in Language Model Fine-Tuning**"*.

*Title*: Theory and Practice in Language Model Fine-Tuning

*Talk Abstract*: Fine-tuning ever larger and more capable language models
(LMs) has proven to be an effective way to solve a variety of language
related tasks. Yet little is understood about what fine-tuning does, and
most traditional optimization analyses cannot account for a pre-trained
initialization. I will start by formalizing the common intuition that
fine-tuning makes a small change to the model. Inspired by the neural
tangent kernel (NTK), we propose an empirically validated and theoretically
sound hypothesis that can approach answering questions like "Why doesn't a
giant LM overfit when fine-tuning it on a few dozen examples?" and "Why
does LoRA work?" Our simple mental model motivates an efficient,
transferable, and optimizer-aware data selection algorithm, dubbed LESS, to
elicit specific capabilities during instruction tuning. Using LESS to
select 5% of the data outperforms on the full dataset, and we can also use
a small model to select data for other models. Finally, I will describe how
insights into the dynamics of fine-tuning inspired us to design a
memory-efficient zeroth-order algorithm (MeZO) that can tune large LMs.
MeZO frequently matches performance while using up to 12x less memory and
half as many GPU-hours as standard fine-tuning. These works were done in
collaboration with researchers at Princeton University and University of
Washington.

*Speaker Bio:* Sadhika Malladi is a PhD student at Princeton University
advised by Sanjeev Arora. She has worked at OpenAI, Cerebras, and Microsoft
Research. She graduated from MIT in 2019 with a degree in mathematics and
computer science and a degree in philosophy. Her work focuses on the
interplay between theory and empirics, especially with respect to language
models.

*In person: *GHC 6115
*Zoom Link*:
https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09

Thanks,
Asher Trockman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20240321/96916a41/attachment.html>


More information about the ai-seminar-announce mailing list