<div dir="ltr"><div>This is a reminder that this talk is tomorrow, Tuesday, September 13th.</div><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Ellen Vitercik</b> <span dir="ltr"><<a href="mailto:vitercik@cs.cmu.edu">vitercik@cs.cmu.edu</a>></span><br>Date: Wed, Sep 7, 2016 at 12:57 PM<br>Subject: AI Lunch -- Stephan Mandt -- September 13th, 2016<br>To: <a href="mailto:ai-seminar-announce@cs.cmu.edu">ai-seminar-announce@cs.cmu.edu</a>, Stephan Mandt <<a href="mailto:stephan.mandt@disneyresearch.com">stephan.mandt@disneyresearch.com</a>><br><br><br><div dir="ltr">Dear faculty and students,<br><br>We look forward to seeing you this Tuesday, September 13th, at noon in NSH 3305 for AI lunch. To learn more about the seminar and lunch, please visit the <a href="http://www.cs.cmu.edu/~aiseminar/" target="_blank">AI Lunch webpage</a>.<div><br>On Tuesday, <a href="http://www.stephanmandt.com" target="_blank">Stephan Mandt</a>, a Research Scientist at Disney Research Pittsburgh, will give a talk titled "Variational Inference: From Artificial Temperatures to Stochastic Gradients."<br><br><b>Abstract:</b> Bayesian modeling is a popular approach to solving machine learning problems. In this talk, we will first review variational inference, where we map Bayesian inference to an optimization problem. This optimization problem is non-convex, meaning that there are many local optima that correspond to poor fits of the data. We first show that by introducing a “local temperature” to every data point and applying the machinery of variational inference, we can avoid some of these poor optima, suppress the effects of outliers, and ultimately find more meaningful patterns. In the second part of the talk, we will then present a Bayesian view on Stochastic Gradient Descent (SGD). When operated with a constant, non-decreasing learning rates, SGD first marches towards the optimum of the objective and then samples from a stationary distribution that is centered around the optimum. As such, SGD resembles Markov Chain Monte Carlo (MCMC) algorithms which, after a burn-in period, draw samples from a Bayesian posterior. Drawing on the tools of variational inference, we investigate and formalize this connection. Our analysis reveals criteria that allow us to use SGD as an approximate scalable MCMC algorithm that can compete with more complicated state-of-the-art Bayesian approaches.<br><font color="#5856d6"><br></font><b>Speaker bio: </b>Stephan Mandt is a Research Scientist at Disney Research Pittsburgh where he leads the statistical machine learning group. Previously, he was a postdoctoral researcher with David Blei at Columbia University, where he worked on scalable approximate Bayesian inference algorithms. Trained as a statistical physicist, he held a previous postdoctoral fellowship at Princeton University and holds a Ph.D. from the University of Cologne as a fellow of the German National Merit Foundation. <div><br></div><div>Personal website: <a href="http://www.stephanmandt.com/" target="_blank">www.stephanmandt.com</a></div><div><br></div>Best,<br>Ellen and Ariel<br></div></div>

</div><br></div>