[CMU AI Seminar] ✨🐦 November 14 at 12pm (GHC 6115 & Zoom) -- Cyril Zhang (MSR) -- Overstepping the Descent Lemma -- AI Seminar sponsored by SambaNova System

Asher Trockman ashert at cs.cmu.edu
Sat Nov 11 17:43:19 EST 2023


Dear all,

We look forward to seeing you *this Tuesday (11/14)* from *1**2:00-1:00 PM
(U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*,
sponsored by SambaNova Systems <https://sambanova.ai/>. The seminar will be
held in GHC 6115 *with pizza provided *and will be streamed on Zoom.

*🔜 Please email me if you would like to schedule a meeting with Cyril.*

To learn more about the seminar series or to see the future schedule,
please visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.

On this Tuesday (11/14), *Cyril Zhang* (Microsoft Research) will be giving
a talk titled *"**Overstepping the Descent Lemma**"*.

*Title*: Overstepping the Descent Lemma

*Talk Abstract*: What are the dynamics of gradient-based algorithms for
optimizing neural networks? By what principles should we design update
rules for deep learning? These are extremely messy questions, to which
there are no canonical answers yet. In attempting to address these
mysteries with our cherished theoretical frameworks, we face a recurring
theme: a tension between over-idealization and intractability. We'll
discuss how asking "non-standard" questions in clean theoretical models can
shed light on weird, wonderful, and empirically-pertinent nuances of the
trajectory of SGD:

*    • Acceleration via large steps.* By staying within the paradise of
low-noise convex quadratics, we show how making negative local progress can
lead to faster global convergence, via a self-stabilizing “fractal”
learning rate schedule.
*    • Variance reduction without side effects.* We show how gradient
stochasticity can cause catastrophic error amplification in the presence of
feedback loops (like in offline RL or autoregressive language generation).
Many variance reduction mechanisms help, but Polyak averaging is almost
unreasonably effective; we discuss why it’s hard to analyze all these
moving parts.
*    • Non-convex feature learning.* By taking a close look at how deep
learning overcomes a "mildly cryptographic" computational obstruction
(namely, learning a sparse parity), we arrive at a clean testbed for neural
representation learning. With this microscopic proxy for a single neuron’s
training dynamics, mysteries such as grokking, lottery tickets, and scaling
laws are recognizable and analyzable.

Another recurring theme is that hard mathematical questions in this space
are more clearly exposed by running targeted numerical experiments,
including training deep networks on GPUs. I’ll highlight some exciting
progress that other groups have made in recent months.

Joint work with Naman Agarwal, Surbhi Goel, Adam Block, Dylan Foster,
Akshay Krishnamurthy, Max Simchowitz, Boaz Barak, Ben Edelman, Sham Kakade,
and Eran Malach.

*Speaker Bio:* Cyril Zhang <https://cyrilzhang.com> is a Senior Researcher
at Microsoft Research NYC. He has worked on learning and control in
dynamical systems, online & stochastic optimization, and (most recently) a
nascent theoretical, scientific, and algorithmic toolbox for neural
reasoning. He holds a Ph.D. in Computer Science from Princeton University.

*In person: *GHC 6115
*Zoom Link*:
https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09

Thanks,
Asher Trockman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20231111/92927348/attachment.html>


More information about the ai-seminar-announce mailing list