Re: [CMU AI Seminar] ✨🐦 November 14 at 12pm (GHC 6115 & Zoom) -- Cyril Zhang (MSR) -- Overstepping the Descent Lemma -- AI Seminar sponsored by SambaNova System

Tue Nov 14 09:11:56 EST 2023

Reminder this is happening today!

On Sat, Nov 11, 2023 at 5:43 PM Asher Trockman <ashert at cs.cmu.edu> wrote:

> Dear all,
>
> We look forward to seeing you *this Tuesday (11/14)* from *1**2:00-1:00
> PM (U.S. Eastern time)* for the next talk of this semester's
> *CMU AI Seminar*, sponsored by SambaNova Systems <https://sambanova.ai/>.
> The seminar will be held in GHC 6115 *with pizza provided *and will be
> streamed on Zoom.
>
> *🔜 Please email me if you would like to schedule a meeting with Cyril.*
>
> To learn more about the seminar series or to see the future schedule,
> please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.
>
> On this Tuesday (11/14), *Cyril Zhang* (Microsoft Research) will be
> giving a talk titled *"**Overstepping the Descent Lemma**"*.
>
> *Title*: Overstepping the Descent Lemma
>
> *Talk Abstract*: What are the dynamics of gradient-based algorithms for
> optimizing neural networks? By what principles should we design update
> rules for deep learning? These are extremely messy questions, to which
> there are no canonical answers yet. In attempting to address these
> mysteries with our cherished theoretical frameworks, we face a recurring
> theme: a tension between over-idealization and intractability. We'll
> discuss how asking "non-standard" questions in clean theoretical models can
> shed light on weird, wonderful, and empirically-pertinent nuances of the
> trajectory of SGD:
>
> *    • Acceleration via large steps.* By staying within the paradise of
> low-noise convex quadratics, we show how making negative local progress can
> lead to faster global convergence, via a self-stabilizing “fractal”
> learning rate schedule.
> *    • Variance reduction without side effects.* We show how gradient
> stochasticity can cause catastrophic error amplification in the presence of
> feedback loops (like in offline RL or autoregressive language generation).
> Many variance reduction mechanisms help, but Polyak averaging is almost
> unreasonably effective; we discuss why it’s hard to analyze all these
> moving parts.
> *    • Non-convex feature learning.* By taking a close look at how deep
> learning overcomes a "mildly cryptographic" computational obstruction
> (namely, learning a sparse parity), we arrive at a clean testbed for neural
> representation learning. With this microscopic proxy for a single neuron’s
> training dynamics, mysteries such as grokking, lottery tickets, and scaling
> laws are recognizable and analyzable.
>
> Another recurring theme is that hard mathematical questions in this space
> are more clearly exposed by running targeted numerical experiments,
> including training deep networks on GPUs. I’ll highlight some exciting
> progress that other groups have made in recent months.
>
> Joint work with Naman Agarwal, Surbhi Goel, Adam Block, Dylan Foster,
> Akshay Krishnamurthy, Max Simchowitz, Boaz Barak, Ben Edelman, Sham Kakade,
> and Eran Malach.
>
> *Speaker Bio:* Cyril Zhang <https://cyrilzhang.com> is a Senior
> Researcher at Microsoft Research NYC. He has worked on learning and control
> in dynamical systems, online & stochastic optimization, and (most recently)
> a nascent theoretical, scientific, and algorithmic toolbox for neural
> reasoning. He holds a Ph.D. in Computer Science from Princeton University.
>
> *In person: *GHC 6115
> *Zoom Link*:
> https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09
>
> Thanks,
> Asher Trockman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20231114/174309c6/attachment.html>