[CMU AI Seminar] Nov 2 at 12pm (Zoom) -- Jeremy Cohen (CMU MLD) -- Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability -- AI Seminar sponsored by Morgan Stanley

Asher Trockman ashert at cs.cmu.edu
Mon Nov 1 15:31:22 EDT 2021


Hi all,

Just a reminder that the CMU AI Seminar <http://www.cs.cmu.edu/~aiseminar/>
is tomorrow *12pm-1pm*:
https://cmu.zoom.us/j/96099846691?pwd=NEc3UjQ4aHJ5dGhpTHpqYnQ2cnNaQT09.

*Jeremy Cohen (CMU MLD)* will be giving a talk on the surprising dynamics
of full-batch gradient descent on neural networks.

Thanks,
Asher


On Wed, Oct 27, 2021 at 10:56 AM Asher Trockman <ashert at cs.cmu.edu> wrote:

> Dear all,
>
> We look forward to seeing you *next Tuesday (11/2)* from *1**2:00-1:00 PM
> (U.S. Eastern time)* for the next talk of our *CMU AI Seminar*, sponsored
> by Morgan Stanley <https://www.morganstanley.com/about-us/technology/>.
>
> To learn more about the seminar series or see the future schedule, please
> visit the seminar website <http://www.cs.cmu.edu/~aiseminar/>.
>
> On 11/2, *Jeremy Cohen* (CMU MLD) will be giving a talk on "*Gradient
> Descent on Neural Networks Typically Occurs at the Edge of Stability*".
>
> *Title:* Gradient Descent on Neural Networks Typically Occurs at the Edge
> of Stability
>
> *Talk Abstract:* Neural networks are trained using optimization
> algorithms. While we sometimes understand how these algorithms behave in
> restricted settings (e.g. on quadratic or convex functions), very little is
> known about the dynamics of these optimization algorithms on real neural
> objective functions. In this paper, we take a close look at the simplest
> optimization algorithm—full-batch gradient descent with a fixed step
> size—and find that its behavior on neural networks is both (1) surprisingly
> consistent across different architectures and tasks, and (2) surprisingly
> different from that envisioned in the "conventional wisdom."
>
> In particular, we empirically demonstrate that during gradient descent
> training of neural networks, the maximum Hessian eigenvalue (the
> "sharpness") always rises all the way to the largest stable value, which is
> 2/(step size), and then hovers just *above* that numerical value for the
> remainder of training, in a regime we term the "Edge of Stability." (Click
> here <https://twitter.com/deepcohen/status/1366881479175847942> for 1m
> 17s animation.) At the Edge of Stability, the sharpness is still "trying"
> to increase further—and that's what happens if you drop the step size—but
> is somehow being actively restrained from doing so, by the implicit
> dynamics of the optimization algorithm. Our findings have several
> implications for the theory of neural network optimization. First, whereas
> the conventional wisdom in optimization says that the sharpness ought to
> determine the step size, our paper shows that in the topsy-turvy world of
> deep learning, the reality is precisely the opposite: the *step size*
> wholly determines the *sharpness*. Second, our findings imply that
> convergence analyses based on L-smoothness, or on ensuring monotone
> descent, do not apply to neural network training.
>
> *Speaker Bio: *Jeremy Cohen is a PhD student in the Machine Learning
> Department at CMU, co-advised by Zico Kolter and Ameet Talwalkar. His
> research focus is "neural network plumbing": how to initialize and
> normalize neural networks so that they train quickly and generalize well.
>
> *Zoom Link:*
> https://cmu.zoom.us/j/96099846691?pwd=NEc3UjQ4aHJ5dGhpTHpqYnQ2cnNaQT09
>
> Thanks,
> Asher Trockman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/ai-seminar-announce/attachments/20211101/93401f1c/attachment.html>


More information about the ai-seminar-announce mailing list