Re: [CMU AI Seminar] Special! 🧐 March 25 at 3pm (GHC 6115 & Zoom) -- Sadhika Malladi (Princeton) -- Theory and Practice in Language Model Fine-Tuning -- AI Seminar sponsored by SambaNova Systems

Asher Trockman ashert at cs.cmu.edu
Mon Mar 25 12:09:52 EDT 2024


🍕 This is happening today at 3pm! (There will be pizza.)

On Thu, Mar 21, 2024 at 2:02 PM Asher Trockman <ashert at cs.cmu.edu> wrote:

> Dear all,
>
> We look forward to seeing you *this Monday (3/25)* from *3**:00-4:00 PM
> (U.S. Eastern time)* for a special installment of this semester's
> *CMU AI Seminar*, sponsored by SambaNova Systems <https://sambanova.ai/>.
> The seminar will be held in GHC 6115 *with pizza provided *and will be
> streamed on Zoom.
>
> To learn more about the seminar series or to see the future schedule,
> please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.
>
> On this Monday (3/25), *Sadhika Malladi* (Princeton) will be giving a
> talk titled *"**Theory and Practice in Language Model Fine-Tuning**"*.
>
> *Title*: Theory and Practice in Language Model Fine-Tuning
>
> *Talk Abstract*: Fine-tuning ever larger and more capable language models
> (LMs) has proven to be an effective way to solve a variety of language
> related tasks. Yet little is understood about what fine-tuning does, and
> most traditional optimization analyses cannot account for a pre-trained
> initialization. I will start by formalizing the common intuition that
> fine-tuning makes a small change to the model. Inspired by the neural
> tangent kernel (NTK), we propose an empirically validated and theoretically
> sound hypothesis that can approach answering questions like "Why doesn't a
> giant LM overfit when fine-tuning it on a few dozen examples?" and "Why
> does LoRA work?" Our simple mental model motivates an efficient,
> transferable, and optimizer-aware data selection algorithm, dubbed LESS, to
> elicit specific capabilities during instruction tuning. Using LESS to
> select 5% of the data outperforms on the full dataset, and we can also use
> a small model to select data for other models. Finally, I will describe how
> insights into the dynamics of fine-tuning inspired us to design a
> memory-efficient zeroth-order algorithm (MeZO) that can tune large LMs.
> MeZO frequently matches performance while using up to 12x less memory and
> half as many GPU-hours as standard fine-tuning. These works were done in
> collaboration with researchers at Princeton University and University of
> Washington.
>
> *Speaker Bio:* Sadhika Malladi is a PhD student at Princeton University
> advised by Sanjeev Arora. She has worked at OpenAI, Cerebras, and Microsoft
> Research. She graduated from MIT in 2019 with a degree in mathematics and
> computer science and a degree in philosophy. Her work focuses on the
> interplay between theory and empirics, especially with respect to language
> models.
>
> *In person: *GHC 6115
> *Zoom Link*:
> https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09
>
> Thanks,
> Asher Trockman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20240325/919769a9/attachment.html>


More information about the ai-seminar-announce mailing list