[CMU AI Seminar] Nov 2 at 12pm (Zoom) -- Jeremy Cohen (CMU MLD) -- Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability -- AI Seminar sponsored by Morgan Stanley

Tue Nov 2 12:00:00 EDT 2021

Hi all,

The seminar today by Jeremy Cohen on "Gradient Descent on Neural Networks
Typically Occurs at the Edge of Stability" is happening now!

In case you are interested:
https://cmu.zoom.us/j/96099846691?pwd=NEc3UjQ4aHJ5dGhpTHpqYnQ2cnNaQT09

Best,
Asher

On Wed, Oct 27, 2021 at 10:56 AM Asher Trockman <ashert at cs.cmu.edu> wrote:

> Dear all,
>
> We look forward to seeing you *next Tuesday (11/2)* from *1**2:00-1:00 PM
> (U.S. Eastern time)* for the next talk of our *CMU AI Seminar*, sponsored
> by Morgan Stanley <https://www.morganstanley.com/about-us/technology/>.
>
> To learn more about the seminar series or see the future schedule, please
> visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.
>
> On 11/2, *Jeremy Cohen* (CMU MLD) will be giving a talk on "*Gradient
> Descent on Neural Networks Typically Occurs at the Edge of Stability*".
>
> *Title:* Gradient Descent on Neural Networks Typically Occurs at the Edge
> of Stability
>
> *Talk Abstract:* Neural networks are trained using optimization
> algorithms. While we sometimes understand how these algorithms behave in
> restricted settings (e.g. on quadratic or convex functions), very little is
> known about the dynamics of these optimization algorithms on real neural
> objective functions. In this paper, we take a close look at the simplest
> optimization algorithm—full-batch gradient descent with a fixed step
> size—and find that its behavior on neural networks is both (1) surprisingly
> consistent across different architectures and tasks, and (2) surprisingly
> different from that envisioned in the "conventional wisdom."
>
> In particular, we empirically demonstrate that during gradient descent
> training of neural networks, the maximum Hessian eigenvalue (the
> "sharpness") always rises all the way to the largest stable value, which is
> 2/(step size), and then hovers just *above* that numerical value for the
> remainder of training, in a regime we term the "Edge of Stability." (Click
> here <https://twitter.com/deepcohen/status/1366881479175847942> for 1m
> 17s animation.) At the Edge of Stability, the sharpness is still "trying"
> to increase further—and that's what happens if you drop the step size—but
> is somehow being actively restrained from doing so, by the implicit
> dynamics of the optimization algorithm. Our findings have several
> implications for the theory of neural network optimization. First, whereas
> the conventional wisdom in optimization says that the sharpness ought to
> determine the step size, our paper shows that in the topsy-turvy world of
> deep learning, the reality is precisely the opposite: the *step size*
> wholly determines the *sharpness*. Second, our findings imply that
> convergence analyses based on L-smoothness, or on ensuring monotone
> descent, do not apply to neural network training.
>
> *Speaker Bio: *Jeremy Cohen is a PhD student in the Machine Learning
> Department at CMU, co-advised by Zico Kolter and Ameet Talwalkar. His
> research focus is "neural network plumbing": how to initialize and
> normalize neural networks so that they train quickly and generalize well.
>
> *Zoom Link:*
> https://cmu.zoom.us/j/96099846691?pwd=NEc3UjQ4aHJ5dGhpTHpqYnQ2cnNaQT09
>
> Thanks,
> Asher Trockman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20211102/8b0dda7f/attachment.html>