From ashert at cs.cmu.edu Sat Feb 4 17:42:09 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sat, 4 Feb 2023 17:42:09 -0500 Subject: [CMU AI Seminar] February 7 at 12pm (GHC 6115 & Zoom) -- Saurabh Garg (CMU) -- Domain Adaptation under Relaxed Label Shift -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (2/7)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Tuesday (2/7), *Saurabh Garg* (CMU MLD) will be giving a talk titled *"**Domain Adaptation under Relaxed Label Shift**".* *Title*: Domain Adaptation under Relaxed Label Shift *Talk Abstract*: Despite the emergence of principled methods for domain adaptation under label shift, the sensitivity of these methods for minor shifts in the class conditional distributions remains precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with shifts in label proportions. While several papers attempt to adapt these heuristics to accommodate shifts in label proportions, inconsistencies in evaluation criteria, datasets, and baselines, make it hard to assess the state of the art. In this paper, we introduce RLSbench, a large-scale \emph{relaxed label shift} benchmark, consisting of $>$500 distribution shift pairs that draw on 14 datasets across vision, tabular, and language modalities and compose them with varying label proportions. First, we evaluate 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most deep domain adaptation heuristics: (i) *pseudo-balance* the data at each epoch; and (ii) adjust the final classifier with (an estimate of) target label distribution. The meta-algorithm improves existing domain adaptation heuristics often by 2--10\% accuracy points under extreme label proportion shifts and has little (i.e., $<$0.5\%) effect when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. *Speaker Bio:* Saurabh Garg is a fourth-year Ph.D. student in the Machine Learning Department at Carnegie Mellon University, advised by Zachary Lipton and Sivaraman Balakrishnan. Saurabh is interested in building robust and deployable machine learning systems. The primary focus of his research is to improve and evaluate deep learning models in the face of distribution shifts. Before Saurabh started his Ph.D., he received his bachelor?s degree from the Indian Institute of Technology (IIT) Bombay, majoring in Computer Science and Engineering. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Tue Feb 7 11:40:35 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Tue, 7 Feb 2023 11:40:35 -0500 Subject: [CMU AI Seminar] February 7 at 12pm (GHC 6115 & Zoom) -- Saurabh Garg (CMU) -- Domain Adaptation under Relaxed Label Shift -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: Reminder this is happening soon in GHC 6115. On Sat, Feb 4, 2023 at 5:42 PM Asher Trockman wrote: > Dear all, > > We look forward to seeing you *this Tuesday (2/7)* from *1**2:00-1:00 PM > (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, > sponsored by SambaNova Systems . The seminar will > be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > This Tuesday (2/7), *Saurabh Garg* (CMU MLD) will be giving a talk titled > *"**Domain Adaptation under Relaxed Label Shift**".* > > *Title*: Domain Adaptation under Relaxed Label Shift > > *Talk Abstract*: Despite the emergence of principled methods for domain > adaptation under label shift, the sensitivity of these methods for minor > shifts in the class conditional distributions remains precariously under > explored. Meanwhile, popular deep domain adaptation heuristics tend to > falter when faced with shifts in label proportions. While several papers > attempt to adapt these heuristics to accommodate shifts in label > proportions, inconsistencies in evaluation criteria, datasets, and > baselines, make it hard to assess the state of the art. In this paper, we > introduce RLSbench, a large-scale \emph{relaxed label shift} benchmark, > consisting of $>$500 distribution shift pairs that draw on 14 datasets > across vision, tabular, and language modalities and compose them with > varying label proportions. First, we evaluate 13 popular domain adaptation > methods, demonstrating more widespread failures under label proportion > shifts than were previously known. Next, we develop an effective two-step > meta-algorithm that is compatible with most deep domain adaptation > heuristics: (i) *pseudo-balance* the data at each epoch; and (ii) adjust > the final classifier with (an estimate of) target label distribution. The > meta-algorithm improves existing domain adaptation heuristics often by > 2--10\% accuracy points under extreme label proportion shifts and has > little (i.e., $<$0.5\%) effect when label proportions do not shift. We hope > that these findings and the availability of RLSbench will encourage > researchers to rigorously evaluate proposed methods in relaxed label shift > settings. > > *Speaker Bio:* Saurabh Garg is a fourth-year Ph.D. student in the Machine > Learning Department at Carnegie Mellon University, advised by Zachary > Lipton and Sivaraman Balakrishnan. Saurabh is interested in building robust > and deployable machine learning systems. The primary focus of his research > is to improve and evaluate deep learning models in the face of distribution > shifts. Before Saurabh started his Ph.D., he received his bachelor?s > degree from the Indian Institute of Technology (IIT) Bombay, majoring in > Computer Science and Engineering. > > *In person: *GHC 6115 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Feb 12 12:10:16 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 12 Feb 2023 12:10:16 -0500 Subject: [CMU AI Seminar] February 14 at 12pm (GHC 6115 & Zoom) -- Uri Alon (CMU) -- Natural Language Reasoning with Language Models of Code -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, Please join us for this Valentine's Day (*Tuesday, 2/14*) installment of the* CMU AI Seminar Series* to learn about some research we love. The seminar will be held in *GHC 6115* from *1**2:00-1:00 PM (U.S. Eastern time)* *with pizza provided *and will be streamed on Zoom. It is sponsored by SambaNova Systems . To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Tuesday (2/14), *Uri Alon* (CMU LTI) will be giving a talk titled *"**Natural Language Reasoning with Language Models of Code**".* *Title*: Natural Language Reasoning with Language Models of Code *Talk Abstract*: In this talk, I will show that LMs that were pretrained on *code* can be better natural language reasoners than LMs that were trained (mostly) on natural language, even when the task does not involve source code at all. In a class of structured NL reasoning tasks, I will show how we can frame the task as code generation; this makes LMs of code such as Codex better reasoners than LMs of natural language such as T5 and GPT-3. Another class of mathematical reasoning tasks was recently unlocked by methods that require LLMs to generate their explicit reasoning steps, such as ?chain-of-thought? (Wei et al., 2022). Such methods employ LMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LMs seem adept at the step-by-step decomposition part, they often make logical and arithmetic mistakes in the solution part. I will show how LMs of *code* can decompose the natural language problem into runnable steps, which allows us to offload the solution to a programmatic runtime such as a Python interpreter. That is, instead of learning to solve the problem directly, we teach the model to generate a program that solves the problem. Across a variety of benchmarks, this approach leads to more accurate results than much larger models such as PALM-540B using chain-of-thought. *Speaker Bio:* Uri Alon is a Postdoctoral Researcher at LTI, working with Prof. Graham Neubig on NLP and learning from source code. Previously, he obtained his PhD at the Technion (Israel), where he worked on modeling programming languages and graphs. Currently, he is also interested in the synergy of neural models with symbolic components such as retrieval, programs, and automata. His personal website is at https://urialon.ml. Feel free to reach out with any questions or comments about the talk. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Feb 14 11:37:37 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 14 Feb 2023 11:37:37 -0500 Subject: [CMU AI Seminar] UPDATE: In GHC 4405. February 14 at 12pm (GHC 6115 & Zoom) -- Uri Alon (CMU) -- Natural Language Reasoning with Language Models of Code -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: <123D7D8C-AF46-4820-BFE8-DAE8A58A3E5A@andrew.cmu.edu> An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Feb 19 14:37:53 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 19 Feb 2023 14:37:53 -0500 Subject: [CMU AI Seminar] February 21 at 12pm (GHC 6115 & Zoom) -- Chao Wang (USC) -- Differential Verification of Deep Neural Networks -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (2/21)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. *Note:* The speaker will be remote. To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Tuesday (2/21), *Chao Wang* (USC) will be giving a talk titled *"**Differential Verification of Deep Neural Networks**".* *Title*: Differential Verification of Deep Neural Networks *Talk Abstract*: Deep neural networks have become an integral component of many systems for which ensuring safety and robustness is crucial. In this talk, we present several abstract interpretation based methods for efficient verification of a class of safety properties called differential properties. While we focus on neural network equivalence as the canonical example, other interesting properties concerning input sensitivity and stability can also be cast as differential properties. Our key insight is in deriving sound abstractions that relate the intermediate computations of two structurally-similar neural networks, to accurately bound their maximum difference over all inputs. We also propose bound synthesis techniques for automatically generating linear abstractions of arbitrary nonlinear functions, to more efficiently handle architectures and activation functions beyond feed-forward ReLU networks. *Speaker Bio:* Chao Wang is an Associate Professor of Computer Science at the University of Southern California (USC). He develops formal verification and program synthesis techniques for principled design of systems to improve safety and security. He has published two books and more than 100 papers. The awards and recognition he received include a Young Investigator award from the U.S. Office of Naval Research (ONR), a CAREER award from the National Science Foundation (NSF), two ACM SIGSOFT Distinguished Paper awards, and a Best Journal Paper of the Year award from ACM Transactions on Design Automation of Electronic Systems. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Tue Feb 21 16:31:19 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Tue, 21 Feb 2023 16:31:19 -0500 Subject: [CMU AI Seminar] Special! February 23 at 2pm (NSH 3305 & Zoom) -- Ludwig Schmidt (U. Washington) -- A data-centric view on reliable generalization: From ImageNet to LAION-5B -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Thursday (2/23)* from *2:00-3:00 PM (U.S. Eastern time)* for a special installment of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in *NSH 3305* with* pizza provided *and will be streamed on Zoom. *Note:* The speaker will be *in person*. To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Thursday (2/23), *Ludwig Schmidt* (University of Washington) will be giving a talk titled *"**A data-centric view on reliable generalization: >From ImageNet to LAION-5B**".* *Title*: A data-centric view on reliable generalization: From ImageNet to LAION-5B *Talk Abstract*: Researchers have proposed many methods to make neural networks more reliable under distribution shift, yet there is still large room for improvement. Are better training algorithms or training data the more promising way forward? In this talk, we study this question in the context of OpenAI?s CLIP model for learning from image-text data. First, we survey the current robustness landscape based on a large-scale experimental study involving more than 200 different models and test conditions. The CLIP models stand out with unprecedented robustness on multiple challenging distribution shifts. To further improve CLIP, we then introduce new methods for reliably fine-tuning models by interpolating the weights of multiple models. Next, we investigate the cause of CLIP?s robustness via controlled experiments to disentangle the influence of language supervision and training distribution. While CLIP leveraged large scale language supervision for the first time, its robustness actually comes from the pre-training dataset. We conclude with a brief overview of ongoing work to improve pre-training datasets: LAION-5B, the largest public image-text dataset, and initial experiments to increase the robustness induced by pre-training data. *Speaker Bio:* Ludwig Schmidt is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Ludwig?s research interests revolve around the empirical foundations of machine learning, often with a focus on datasets, reliable generalization, and large models. Ludwig completed his PhD at MIT under the supervision of Piotr Indyk and was a postdoc at UC Berkeley hosted by Benjamin Recht and Moritz Hardt. Recently, Ludwig?s research group contributed to multimodal language & vision models by creating OpenCLIP and the LAION-5B dataset. Ludwig?s research received a new horizons award at EAAMO, best paper awards at ICML & NeurIPS, a best paper finalist at CVPR, and the Sprowls dissertation award from MIT. *In person: *NSH 3305 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Thu Feb 23 13:29:52 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Thu, 23 Feb 2023 13:29:52 -0500 Subject: [CMU AI Seminar] Special! February 23 at 2pm (NSH 3305 & Zoom) -- Ludwig Schmidt (U. Washington) -- A data-centric view on reliable generalization: From ImageNet to LAION-5B -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: Reminder this is happening at 2pm in NSH 3305! On Tue, Feb 21, 2023 at 4:31 PM Asher Trockman wrote: > Dear all, > > We look forward to seeing you *this Thursday (2/23)* from *2:00-3:00 PM > (U.S. Eastern time)* for a special installment of this semester's > *CMU AI Seminar*, sponsored by SambaNova Systems . > The seminar will be held in *NSH 3305* with* pizza provided *and will be > streamed on Zoom. *Note:* The speaker will be *in person*. > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > This Thursday (2/23), *Ludwig Schmidt* (University of Washington) will be > giving a talk titled *"**A data-centric view on reliable generalization: > From ImageNet to LAION-5B**".* > > *Title*: A data-centric view on reliable generalization: From ImageNet to > LAION-5B > > *Talk Abstract*: Researchers have proposed many methods to make neural > networks more reliable under distribution shift, yet there is still large > room for improvement. Are better training algorithms or training data the > more promising way forward? In this talk, we study this question in the > context of OpenAI?s CLIP model for learning from image-text data. > > First, we survey the current robustness landscape based on a large-scale > experimental study involving more than 200 different models and test > conditions. The CLIP models stand out with unprecedented robustness on > multiple challenging distribution shifts. To further improve CLIP, we then > introduce new methods for reliably fine-tuning models by interpolating the > weights of multiple models. Next, we investigate the cause of CLIP?s > robustness via controlled experiments to disentangle the influence of > language supervision and training distribution. While CLIP leveraged large > scale language supervision for the first time, its robustness actually > comes from the pre-training dataset. > > We conclude with a brief overview of ongoing work to improve pre-training > datasets: LAION-5B, the largest public image-text dataset, and initial > experiments to increase the robustness induced by pre-training data. > > *Speaker Bio:* Ludwig Schmidt is an assistant professor in the Paul G. > Allen School of Computer Science & Engineering at the University of > Washington. Ludwig?s research interests revolve around the empirical > foundations of machine learning, often with a focus on datasets, reliable > generalization, and large models. Ludwig completed his PhD at MIT under the > supervision of Piotr Indyk and was a postdoc at UC Berkeley hosted by > Benjamin Recht and Moritz Hardt. Recently, Ludwig?s research group > contributed to multimodal language & vision models by creating OpenCLIP and > the LAION-5B dataset. Ludwig?s research received a new horizons award at > EAAMO, best paper awards at ICML & NeurIPS, a best paper finalist at CVPR, > and the Sprowls dissertation award from MIT. > > *In person: *NSH 3305 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Feb 26 16:20:49 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 26 Feb 2023 16:20:49 -0500 Subject: [CMU AI Seminar] February 28 at 12pm (GHC 6115 & Zoom) -- Steven Jecmen (CMU) -- Off-Policy Evaluation of Peer-Review Assignment Strategies -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (2/28)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Tuesday (2/28), *Steven Jecmen* (CMU) will be giving a talk titled *"**Off-Policy Evaluation of Peer-Review Assignment Strategies**".* *Title*: Off-Policy Evaluation of Peer-Review Assignment Strategies *Talk Abstract*: Peer review assignment algorithms aim to match research papers to suitable expert reviewers, working to maximize the quality of the resulting reviews. A key challenge in designing effective assignment strategies is evaluating how changes to the assignment algorithm map to changes in quality. In this talk, I will show how we leverage recently-proposed strategies that introduce randomness in peer-review assignment (aiming to mitigate fraud) as a valuable opportunity to evaluate counterfactual assignment strategies. To address challenges in applying standard off-policy evaluation techniques, we introduce novel methods for partial identification based on mild assumptions about the review quality outcomes. We apply our methods to peer-review data from two computer science venues and examine the effect on review quality of various changes to the paper assignment algorithms. *Speaker Bio:* Steven Jecmen is a fourth-year Ph.D. student in the Computer Science Department at Carnegie Mellon University, advised by Fei Fang and Nihar Shah. Steven's recent research focuses primarily on preventing undesirable behavior in settings with human evaluators, particularly in academic peer review. His personal website is at https://sjecmen.github.io/. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Mar 26 11:51:42 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 26 Mar 2023 11:51:42 -0400 Subject: [CMU AI Seminar] March 28 at 12pm (GHC 6115 & Zoom) -- Siddharth Prasad (CMU) -- Bicriteria Multidimensional Mechanism Design with Side Information -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (3/28)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Tuesday (3/28), *Siddharth Prasad* (CMU) will be giving a talk titled *"**Bicriteria Multidimensional Mechanism Design with Side Information**".* *Title*: Bicriteria Multidimensional Mechanism Design with Side Information *Talk Abstract*: We develop a versatile new methodology for multidimensional mechanism design that incorporates side information about agent types with the bicriteria goal of generating high social welfare and high revenue simultaneously. Side information can come from a variety of sources---examples include advice from a domain expert, predictions from a machine-learning model trained on historical agent data, or even the mechanism designer's own gut instinct---and in practice such sources are abundant. In this work we adopt a prior-free perspective that makes no assumptions on the correctness, accuracy, or source of the side information. First, we design a meta-mechanism that integrates input side information with an improvement of the classical VCG mechanism. The welfare, revenue, and incentive properties of our meta-mechanism are characterized by a number of novel constructions we introduce based on the notion of a *weakest competitor,* which is an agent that has the smallest impact on welfare. We then show that our meta-mechanism---when carefully instantiated---simultaneously achieves strong welfare and revenue guarantees that are parameterized by errors in the side information. When the side information is highly informative and accurate, our mechanism achieves welfare and revenue competitive with the total social surplus, and its performance decays continuously and gradually as the quality of the side information decreases. Finally, we apply our meta-mechanism to a setting where each agent's type is determined by a constant number of parameters. Specifically, agent types lie on constant-dimensional subspaces (of the potentially high-dimensional ambient type space) that are known to the mechanism designer. We use our meta-mechanism to obtain the first known welfare and revenue guarantees in this setting. *Speaker Bio:* Siddharth Prasad is a fourth-year PhD student in the Computer Science Department at Carnegie Mellon University advised by Nina Balcan and Tuomas Sandholm. His research interests span machine learning, integer programming, mechanism design, algorithms, and their various interactions. He was a student researcher at Google Research during Summer 2022, hosted by Craig Boutilier and Martin Mladenov. He received a B.S. in math and computer science from Caltech in 2019. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Tue Mar 28 11:06:09 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Tue, 28 Mar 2023 11:06:09 -0400 Subject: [CMU AI Seminar] March 28 at 12pm (GHC 6115 & Zoom) -- Siddharth Prasad (CMU) -- Bicriteria Multidimensional Mechanism Design with Side Information -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: Reminder that this is happening today! On Sun, Mar 26, 2023 at 11:51?AM Asher Trockman wrote: > Dear all, > > We look forward to seeing you *this Tuesday (3/28)* from *1**2:00-1:00 PM > (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, > sponsored by SambaNova Systems . The seminar will > be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > This Tuesday (3/28), *Siddharth Prasad* (CMU) will be giving a talk > titled *"**Bicriteria Multidimensional Mechanism Design with Side > Information**".* > > *Title*: Bicriteria Multidimensional Mechanism Design with Side > Information > > *Talk Abstract*: We develop a versatile new methodology for > multidimensional mechanism design that incorporates side information about > agent types with the bicriteria goal of generating high social welfare > and high revenue simultaneously. Side information can come from a variety > of sources---examples include advice from a domain expert, predictions from > a machine-learning model trained on historical agent data, or even the > mechanism designer's own gut instinct---and in practice such sources are > abundant. In this work we adopt a prior-free perspective that makes no > assumptions on the correctness, accuracy, or source of the side > information. > > First, we design a meta-mechanism that integrates input side information > with an improvement of the classical VCG mechanism. The welfare, revenue, > and incentive properties of our meta-mechanism are characterized by a > number of novel constructions we introduce based on the notion of a *weakest > competitor,* which is an agent that has the smallest impact on welfare. > We then show that our meta-mechanism---when carefully > instantiated---simultaneously achieves strong welfare and revenue > guarantees that are parameterized by errors in the side information. When > the side information is highly informative and accurate, our mechanism > achieves welfare and revenue competitive with the total social surplus, and > its performance decays continuously and gradually as the quality of the > side information decreases. > > Finally, we apply our meta-mechanism to a setting where each agent's type > is determined by a constant number of parameters. Specifically, agent types > lie on constant-dimensional subspaces (of the potentially high-dimensional > ambient type space) that are known to the mechanism designer. We use our > meta-mechanism to obtain the first known welfare and revenue guarantees in > this setting. > > *Speaker Bio:* Siddharth Prasad is a > fourth-year PhD student in the Computer Science Department at Carnegie > Mellon University advised by Nina Balcan and Tuomas Sandholm. His research > interests span machine learning, integer programming, mechanism design, > algorithms, and their various interactions. > He was a student researcher at Google Research during Summer 2022, hosted > by Craig Boutilier and Martin Mladenov. He received a B.S. in math and > computer science from Caltech in 2019. > > *In person: *GHC 6115 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Mon Apr 3 09:26:35 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Mon, 3 Apr 2023 09:26:35 -0400 Subject: [CMU AI Seminar] April 4 at 12pm (GHC 6115 & Zoom) -- Rattana Pukdee (CMU) -- Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (4/4)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Tuesday (4/4), *Rattana Pukdee* (CMU) will be giving a talk titled *"**Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games**".* *Title*: Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games *Talk Abstract*: In this talk, we will look at the problem of learning adversarially robust models from the perspective of a 2-player zero-sum game. We will show that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy (which resembles adversarial training) of such a game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium. *Speaker Bio:* Rattana Pukdee is a second-year PhD student in the Machine Learning Department at Carnegie Mellon University, working with Nina Balcan and Pradeep Ravikumar. His current research interests are in learning with domain knowledge and reliable machine learning. Previously, he received a Master in Mathematics from the University of Oxford. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Tue Apr 4 11:37:19 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Tue, 4 Apr 2023 11:37:19 -0400 Subject: [CMU AI Seminar] April 4 at 12pm (GHC 6115 & Zoom) -- Rattana Pukdee (CMU) -- Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: Reminder that this is happening soon. On Mon, Apr 3, 2023 at 9:26?AM Asher Trockman wrote: > Dear all, > > We look forward to seeing you *this Tuesday (4/4)* from *1**2:00-1:00 PM > (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, > sponsored by SambaNova Systems . The seminar will > be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > This Tuesday (4/4), *Rattana Pukdee* (CMU) will be giving a talk titled > *"**Nash Equilibria and Pitfalls of Adversarial Training in Adversarial > Robustness Games**".* > > *Title*: Nash Equilibria and Pitfalls of Adversarial Training in > Adversarial Robustness Games > > *Talk Abstract*: In this talk, we will look at the problem of learning > adversarially robust models from the perspective of a 2-player zero-sum > game. We will show that even in a simple scenario of a linear classifier > and a statistical model that abstracts robust vs. non-robust features, the > alternating best response strategy (which resembles adversarial training) > of such a game may not converge. On the other hand, a unique pure Nash > equilibrium of the game exists and is provably robust. We support our > theoretical results with experiments, showing the non-convergence of > adversarial training and the robustness of Nash equilibrium. > > *Speaker Bio:* Rattana Pukdee is a second-year PhD student in the Machine > Learning Department at Carnegie Mellon University, working with Nina Balcan > and Pradeep Ravikumar. His current research interests are in learning with > domain knowledge and reliable machine learning. Previously, he received a > Master in Mathematics from the University of Oxford. > > *In person: *GHC 6115 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Mon Apr 10 07:57:33 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Mon, 10 Apr 2023 07:57:33 -0400 Subject: [CMU AI Seminar] April 11 at 12pm (GHC 6115 & Zoom) -- Lucio Dery (CMU) -- An automated transfer learning approach to tackling learning under limited data -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (4/11)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Tuesday (4/11), *Lucio Dery* (CMU) will be giving a talk titled *"**An automated transfer learning approach to tackling learning under limited data**".* *Title*: An automated transfer learning approach to tackling learning under limited data *Talk Abstract*: Transfer learning is arguably the engine of the current deep learning revolution in machine learning. A common branch of transfer learning is learning with auxiliary objectives ? supplementary learning signals that are introduced to help aid learning on data-starved or highly complex end-tasks. Whilst much work has been done to formulate useful auxiliary objectives, their construction is still an art which proceeds by slow and tedious hand-design. Intuition for how and when these objectives improve end-task performance has also had limited theoretical backing. In this talk, I will present a task agnostic approach for automatically generating a suite of auxiliary objectives and maximally utilizing them to benefit a specified end-task. We achieve this by deconstructing existing objectives within a novel unified taxonomy, identifying connections between them, and generating new ones based on the uncovered structure. We theoretically formalize widely-held intuitions about how auxiliary learning improves generalization on the end-task which leads us to a principled and efficient algorithm for searching the space of generated objectives to find those most useful to a specified end-task. *Speaker Bio:* Lucio Dery is a PhD student in the Computer Science Department at Carnegie Mellon University co-advised by Ameet Talwalkar and Graham Neubig. Before starting his PhD, he was a Research Engineer at Facebook AI Research (FAIR) in Seattle. His current research interests broadly cover all things related to learning from multiple tasks: transfer learning, meta-learning, multi-tasking and auxiliary learning. He primarily explores these fields in the context of Natural Language Processing but the tools he develops are domain agnostic. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Apr 11 11:35:55 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 11 Apr 2023 11:35:55 -0400 Subject: [CMU AI Seminar] April 11 at 12pm (GHC 6115 & Zoom) -- Lucio Dery (CMU) -- An automated transfer learning approach to tackling learning under limited data -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Thu Apr 13 08:06:31 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Thu, 13 Apr 2023 08:06:31 -0400 Subject: [CMU AI Seminar] April 13 (Today!) at 12pm (GHC 6115 & Zoom) -- Mazda Moayeri (UMD) -- Turning Models into Super Models without Supersizing: Making the most of what we already have -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you* today, **this Thursday (4/13)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. *Note: the speaker will be remote.* To learn more about the seminar series or to see the future schedule, please visit the seminar website . Today (4/13), *Mazda Moayeri* (UMD) will be giving a talk titled *"**Turning Models into Super Models without Supersizing: Making the most of what we already have**".* *Title*: Turning Models into Super Models without Supersizing: Making the most of what we already have *Talk Abstract*: Newer, larger vision models trained on more data are released nearly every week. However, many critical issues persist, such as poor interpretability, robustness, and fairness. Today, we ask, how can we better use the models and data we already have to tackle these issues? First, we propose a method to organize data for improved spurious correlation robustness: We utilize an adversarially trained model to discover spurious features that models rely upon, and scalably measure the presence of these spurious cues (i.e. spuriosity) per image. After ranking images by spuriosity, we can very easily measure and mitigate bias caused by spurious correlations, all without needing new data. We demonstrate the feasibility of our framework on ImageNet, resulting in a massive dataset ( salient-imagenet.cs.umd.edu) answering the question, ?what reasons do deep models use to solve ImageNet classification??. Next, we show how existing models can work together with minimal additional training. Namely, we present a method for accessing the feature space of an off-the-shelf vision models directly with text, extending multimodal (i.e. CLIP) capabilities to smaller unimodal models trained with far less data and supervision. Our method unlocks many new powers (especially for interpretability) of existing models, all without ever needing to change the model. For example, we show how after training just one linear layer, we can use a basic ResNet to retrieve images using text, diagnose distribution shifts w.r.t. human concepts, and even perform zero-shot classification nearly on par with CLIP. *Speaker Bio:* Mazda Moayeri is a third year PhD student in the Computer Science Department at the University of Maryland, advised by Dr. Soheil Feizi. His research focuses on building practical, efficient methods to improve the reliability and trustworthiness of AI. Having worked on adversarial and distributional robustness, his work now combines interpretability with robustness to diagnose distribution shifts and tailor mitigation strategies specific to the uncovered vulnerabilities. He is supported by the ARCS foundation as their Endowment Scholar and will be hosted by FAIR in the summer. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Thu Apr 13 11:56:24 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Thu, 13 Apr 2023 11:56:24 -0400 Subject: [CMU AI Seminar] April 13 (Today!) at 12pm (GHC 6115 & Zoom) -- Mazda Moayeri (UMD) -- Turning Models into Super Models without Supersizing: Making the most of what we already have -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: Reminder this is happening now. There's pizza. On Thu, Apr 13, 2023 at 8:06?AM Asher Trockman wrote: > Dear all, > > We look forward to seeing you* today, **this Thursday (4/13)* from *1**2:00-1:00 > PM (U.S. Eastern time)* for the next talk of this semester's > *CMU AI Seminar*, sponsored by SambaNova Systems . > The seminar will be held in GHC 6115 *with pizza provided *and will be > streamed on Zoom. *Note: the speaker will be remote.* > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > Today (4/13), *Mazda Moayeri* (UMD) will be giving a talk titled *"**Turning > Models into Super Models without Supersizing: Making the most of what we > already have**".* > > *Title*: Turning Models into Super Models without Supersizing: Making the > most of what we already have > > *Talk Abstract*: Newer, larger vision models trained on more data are > released nearly every week. However, many critical issues persist, such as > poor interpretability, robustness, and fairness. Today, we ask, how can we > better use the models and data we already have to tackle these issues? > First, we propose a method to organize data for improved spurious > correlation robustness: We utilize an adversarially trained model to > discover spurious features that models rely upon, and scalably measure the > presence of these spurious cues (i.e. spuriosity) per image. After ranking > images by spuriosity, we can very easily measure and mitigate bias caused > by spurious correlations, all without needing new data. We demonstrate the > feasibility of our framework on ImageNet, resulting in a massive dataset ( > salient-imagenet.cs.umd.edu) answering the question, ?what reasons do > deep models use to solve ImageNet classification??. Next, we show how > existing models can work together with minimal additional training. Namely, > we present a method for accessing the feature space of an off-the-shelf > vision models directly with text, extending multimodal (i.e. CLIP) > capabilities to smaller unimodal models trained with far less data and > supervision. Our method unlocks many new powers (especially for > interpretability) of existing models, all without ever needing to change > the model. For example, we show how after training just one linear layer, > we can use a basic ResNet to retrieve images using text, diagnose > distribution shifts w.r.t. human concepts, and even perform zero-shot > classification nearly on par with CLIP. > > *Speaker Bio:* Mazda Moayeri is a third year PhD student in the Computer > Science Department at the University of Maryland, advised by Dr. Soheil > Feizi. His research focuses on building practical, efficient methods to > improve the reliability and trustworthiness of AI. Having worked on > adversarial and distributional robustness, his work now combines > interpretability with robustness to diagnose distribution shifts and tailor > mitigation strategies specific to the uncovered vulnerabilities. He is > supported by the ARCS foundation as their Endowment Scholar and will be > hosted by FAIR in the summer. > > *In person: *GHC 6115 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Mon Apr 17 15:40:08 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Mon, 17 Apr 2023 15:40:08 -0400 Subject: [CMU AI Seminar] April 18 at 12pm (GHC 6115 & Zoom) -- Sayan Mitra (UIUC) -- Assuring Safety of Learning-Enabled Systems with Perception Contracts -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (4/18)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. *Note: the speaker will be remote.* To learn more about the seminar series or to see the future schedule, please visit the seminar website . This Tuesday (4/18), *Sayan Mitra* (UIUC) will be giving a talk titled *"**Assuring Safety of Learning-Enabled Systems with Perception Contracts**".* *Title*: Assuring Safety of Learning-Enabled Systems with Perception Contracts *Talk Abstract*: Formal verification of deep learning models remains challenging, and yet they are becoming integral in many safety-critical autonomous systems. We present an invariance and abstraction-based method for reasoning about end-to-end safety of such learning-enabled systems. The method constructs approximations of the DNN or perception models, called perception contracts, using system-level safety requirements and program analysis. Mathematically proving that a given perception model conforms to a contract remains a challenge, and may well be impractical, but empirical measures of conformance can provide confidence to safety claims. The resulting contracts are low-dimensional, intelligible, and can be used to verify end-to-end safety. We will discuss vision-based lane keeping and several other ongoing applications of this method and related future research challenges. *Speaker Bio:* Sayan is a Professor and John Bardeen Faculty Scholar of ECE at UIUC. Sayan received his PhD from MIT and his research is on formal verification and safe autonomy. His group is well-known for developing algorithms for data-driven verification and synthesis, some of which are being commercialized. His textbook on verification of cyber-physical systems was published in 2021. Former PhD students from his group are now professors at Vanderbilt, NC Chapel Hill, MIT, and WashU. The group's work has been recognized with ACM Doctoral Dissertation Award, NSF CAREER Award, AFOSR YIP, and several best paper awards. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Apr 18 10:08:22 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 18 Apr 2023 10:08:22 -0400 Subject: Correction! [CMU AI Seminar] April 18 at 12pm (NSH 3305 & Zoom) -- Sayan Mitra (UIUC) -- Assuring Safety of Learning-Enabled Systems with Perception Contracts -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Thu Apr 20 10:41:50 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Thu, 20 Apr 2023 10:41:50 -0400 Subject: [CMU AI Seminar] April 20 (Today!) at 12pm (GHC 6115 & Zoom) -- Bingbin Liu (CMU) -- Thinking Fast with Transformers: Algorithmic Reasoning via Shortcuts -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you* today, **this Thursday (4/20)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . Today (4/20), *Bingbin Liu* (CMU) will be giving a talk titled *"**Thinking Fast with Transformers: Algorithmic Reasoning via Shortcuts**".* *Title*: Thinking Fast with Transformers: Algorithmic Reasoning via Shortcuts *Talk Abstract*: Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are these shallow and non-recurrent models finding? In this talk, we will formalize reasoning in the setting of automata, and show that the computation of an automaton on an input sequence of length T can be replicated exactly by Transformers with o(T) layers, which we call "shortcuts". We provide two constructions, with O(log T) layers for all automata and O(1) layers for solvable automata. Empirically, our results from synthetic experiments show that shallow solutions can also be found in practice. *Speaker Bio:* Bingbin Liu is a fourth-year PhD student at the Machine Learning Department of Carnegie Mellon University, co-advised by Pradeep Ravikumar and Andrej Risteski. Her research focuses on the theoretical understanding of self-supervised learning and unsupervised learning, often motivated by findings in vision and language. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Thu Apr 20 11:50:59 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Thu, 20 Apr 2023 11:50:59 -0400 Subject: [CMU AI Seminar] April 20 (Today!) at 12pm (GHC 6115 & Zoom) -- Bingbin Liu (CMU) -- Thinking Fast with Transformers: Algorithmic Reasoning via Shortcuts -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: Reminder this is happening in ~10m! (Sorry for the late notice) On Thu, Apr 20, 2023 at 10:41?AM Asher Trockman wrote: > Dear all, > > We look forward to seeing you* today, **this Thursday (4/20)* from *1**2:00-1:00 > PM (U.S. Eastern time)* for the next talk of this semester's > *CMU AI Seminar*, sponsored by SambaNova Systems . > The seminar will be held in GHC 6115 *with pizza provided *and will be > streamed on Zoom. > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > Today (4/20), *Bingbin Liu* (CMU) will be giving a talk titled *"**Thinking > Fast with Transformers: Algorithmic Reasoning via Shortcuts**".* > > *Title*: Thinking Fast with Transformers: Algorithmic Reasoning via > Shortcuts > > *Talk Abstract*: Algorithmic reasoning requires capabilities which are > most naturally understood through recurrent models of computation, like the > Turing machine. However, Transformer models, while lacking recurrence, are > able to perform such reasoning using far fewer layers than the number of > reasoning steps. This raises the question: what solutions are these shallow > and non-recurrent models finding? In this talk, we will formalize reasoning > in the setting of automata, and show that the computation of an automaton > on an input sequence of length T can be replicated exactly by Transformers > with o(T) layers, which we call "shortcuts". We provide two constructions, > with O(log T) layers for all automata and O(1) layers for solvable > automata. Empirically, our results from synthetic experiments show that > shallow solutions can also be found in practice. > > *Speaker Bio:* Bingbin Liu is a fourth-year PhD student at the Machine > Learning Department of Carnegie Mellon University, co-advised by Pradeep > Ravikumar and Andrej Risteski. Her research focuses on the theoretical > understanding of self-supervised learning and unsupervised learning, often > motivated by findings in vision and language. > > *In person: *GHC 6115 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Apr 23 21:40:38 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 23 Apr 2023 21:40:38 -0400 Subject: [CMU AI Seminar] April 25 at 12pm (NSH 3305 & Zoom) -- Xinyi Chen (Princeton) -- A Nonstochastic Control Approach to Optimization -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (4/25)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in NSH 3305 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . Today (4/25), *Xinyi Chen* (Princeton) will be giving a talk titled *"**A Nonstochastic Control Approach to Optimization**".* *Title*: A Nonstochastic Control Approach to Optimization *Talk Abstract*: Selecting the best hyperparameters for a particular optimization instance, such as the learning rate and momentum, is an important but nonconvex problem. As a result, iterative optimization methods such as hypergradient descent lack global optimality guarantees in general. We propose an online nonstochastic control methodology for mathematical optimization. First, we formalize the setting of meta-optimization, an online learning formulation of learning the best optimization algorithm from a class of methods. The meta-optimization problem over gradient-based methods can be framed as a feedback control problem over the choice of hyperparameters, including the learning rate, momentum, and the preconditioner. Although the original optimal control problem is nonconvex, we show how recent methods from online nonstochastic control using convex relaxations can be used to circumvent the nonconvexity, and obtain regret guarantees vs. the best offline solution. This guarantees that in meta-optimization, given a sequence of optimization problems, we can learn a method that attains convergence comparable to that of the best optimization method in hindsight from a class of methods. *Speaker Bio:* Xinyi Chen is a fourth-year Ph.D. student in the Computer Science department at Princeton University, advised by Prof. Elad Hazan. Her research is at the intersection of online learning, optimization, and control. Previously, she obtained her undergraduate degree from Princeton in Mathematics, where she received the Middleton Miller Prize. She is a recipient of the NSF Graduate Research Fellowship and a participant of EECS Rising Stars at UC Berkeley. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Fri Sep 15 13:53:36 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Fri, 15 Sep 2023 13:53:36 -0400 Subject: [CMU AI Seminar] September 19 at 12pm (GHC 6115 & Zoom) -- Keegan Harris (CMU) -- Algorithmic Decision-Making under Incentives: Apple Tasting Feedback and Multiclass Learnability -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (9/19)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (9/19), *Keegan Harris* (CMU) will be giving a talk titled *"**Algorithmic Decision-Making under Incentives: Apple Tasting Feedback and Multiclass Learnability**".* *Title*: Algorithmic Decision-Making under Incentives: Apple Tasting Feedback and Multiclass Learnability *Talk Abstract*: Algorithmic systems have recently been used to aid in or automate decision-making in high-stakes domains in order to, e.g. improve efficiency or reduce human bias. When subjugated to decision-making in these settings, decision-subjects (or agents) have an incentive to strategically modify their observable attributes in order to appear more qualified. Moreover, in many domains of interest (e.g. lending and hiring), the decision-maker only observes feedback if they assign a positive decision to an agent; this type of feedback is often referred to as apple tasting (or one-sided) feedback. In the first part of the talk, we examine the effects of apple tasting feedback in the online (binary) strategic classification setting. We provide several algorithms which achieve sublinear regret with respect to the best fix policy in hindsight if the agents were truthful (i.e. non-strategic). We also show how our results may be easily adapted to the setting where the decision-maker receives bandit feedback. Next, we shift our focus to the multiclass extension of strategic classification. Despite being well-motivated in settings such as e-commerce and medical domains, the multiclass version of the problem has received relatively little attention in the current literature on classification under incentives. Perhaps somewhat surprisingly, we show that unlike in the binary setting, strategyproof multiclass classification is generally not possible, even when full feedback is observed. This talk is based on two recent preprints: https://arxiv.org/pdf/2306.06250.pdf, https://arxiv.org/pdf/2211.14236.pdf *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Sep 19 10:13:30 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 19 Sep 2023 10:13:30 -0400 Subject: [CMU AI Seminar] September 19 at 12pm (GHC 6115 & Zoom) -- Keegan Harris (CMU) -- Algorithmic Decision-Making under Incentives: Apple Tasting Feedback and Multiclass Learnability -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sat Sep 30 17:13:23 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sat, 30 Sep 2023 17:13:23 -0400 Subject: [CMU AI Seminar] October 3 at 12pm (GHC 6115 & Zoom) -- Nikhil Ghosh (UC Berkeley) -- Hyperparameter Transfer for Finetuning Large-Scale Models -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (10/3)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (10/3), *Nikhil Ghosh* (UC Berkeley) will be giving a talk titled *"**Hyperparameter Transfer for Finetuning Large-Scale Models**".* *Title*: Hyperparameter Transfer for Finetuning Large-Scale Models *Talk Abstract*: Current models have become so large that most practitioners are unable to effectively tune hyperparameters due to limited computational resources, which results in suboptimal performance. In this talk I will be discussing ongoing work which aims to address this issue by transferring the optimal learning rate from smaller models. This work builds on previous ideas of Yang et al. (2022), which achieves hyperparameter transfer for pretraining large models. In the current work, we aim to study the same problem but in the finetuning setting. By reducing the width of a pretrained model via random subsampling and rescaling according to the muP parameterization of Yang et al, we obtain a smaller proxy model which we can finetune with significantly less resources. In certain settings, such as when finetuning using LoRA on large datasets, the optimal learning rate is preserved under subsampling, which allows for immediate transfer to larger models. In general, however, we find through both experiments and theoretical calculations that the optimal learning rate can display a rich variety of scaling behaviors. Characterizing the scaling behavior requires understanding more fine-grained aspects of training and generalization. *Speaker Bio:* Nikhil Ghosh is a PhD student in the Statistics department at UC Berkeley working with Bin Yu and Song Mei. His main interests are in the theory of deep learning. Previously he studied computer science at Caltech and has completed internships at Google and Microsoft Research. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Oct 3 09:53:41 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 3 Oct 2023 09:53:41 -0400 Subject: [CMU AI Seminar] October 3 at 12pm (GHC 6115 & Zoom) -- Nikhil Ghosh (UC Berkeley) -- Hyperparameter Transfer for Finetuning Large-Scale Models -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: <28F9ABF2-039D-48A2-9E43-44929A5F3BE5@andrew.cmu.edu> An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Fri Oct 6 16:54:30 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Fri, 6 Oct 2023 16:54:30 -0400 Subject: [CMU AI Seminar] October 10 at 12pm (GHC 6115 & Zoom) -- Jane Lange (MIT) -- Agnostic Proper Learning of Monotone Functions: Beyond the Black-Box Correction Barrier -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (10/10)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (10/10), *Jane Lange* (MIT) will be giving a talk titled *"**Agnostic Proper Learning of Monotone Functions: Beyond the Black-Box Correction Barrier**" *to present an agnostic, efficient, proper learning algorithm for monotone Boolean functions. *Title*: Agnostic Proper Learning of Monotone Functions: Beyond the Black-Box Correction Barrier *Talk Abstract*: We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given 2^?(sqrt(n)/?) uniformly random examples of an unknown function f:{?1}^n?{?1}, our algorithm outputs a hypothesis g:{?1}^n?{?1} that is monotone and (opt+?)-close to f, where opt is the distance from f to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also 2^?(sqrt(n)/?), nearly matching the lower bound of Blais et al (RANDOM '15). We also give an algorithm for estimating up to additive error ? the distance of an unknown function f to monotone using a run-time of 2^?(sqrt(n)/?). Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity. This work builds upon the improper learning algorithm of Bshouty and Tamon (JACM '96) and the proper semiagnostic learning algorithm of Lange, Rubinfeld, and Vasilyan (FOCS '22), which obtains a non-monotone Boolean-valued hypothesis, then "corrects" it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than 2opt+? information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the "poset sorting'' problem of [LRV22] for functions over general posets with non-Boolean labels. *Speaker Bio:* Jane Lange is a PhD student at MIT CSAIL studying theoretical computer science under Ronitt Rubinfeld. She is broadly interested in algorithms related to machine learning and property testing. More specifically, she is interested in the application of techniques from the fields of sublinear algorithms and analysis of Boolean functions for the purpose of designing efficient algorithms for learning, property testing, and other similar problems. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Oct 10 10:27:01 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 10 Oct 2023 10:27:01 -0400 Subject: [CMU AI Seminar] October 10 at 12pm (GHC 6115 & Zoom) -- Jane Lange (MIT) -- Agnostic Proper Learning of Monotone Functions: Beyond the Black-Box Correction Barrier -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: <68729F7F-105A-424D-AFB6-45ABEE0B3B9E@andrew.cmu.edu> An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sat Oct 14 13:51:03 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sat, 14 Oct 2023 13:51:03 -0400 Subject: [CMU AI Seminar] October 17 at 12pm (GHC 6115 & Zoom) -- Nicholas Roberts (UW Madison) -- Geometry-Aware Adaptation for Pretrained Models -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (10/17)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (10/17), *Nicholas Roberts* (UW Madison) will be giving a talk titled *"**Geometry-Aware Adaptation for Pretrained Models**"*. *Title*: Geometry-Aware Adaptation for Pretrained Models *Talk Abstract*: Machine learning models?including prominent zero-shot models?are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes?or, in the case of zero-shot prediction, to improve its performance?without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping arg max with the Fr?chet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, LOKI, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, LOKI can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP. *Speaker Bio:* Nicholas Roberts is a third year Ph.D. student at the University of Wisconsin-Madison advised by Frederic Sala. This past summer, he completed an internship with the Physics of AGI research group at Microsoft Research led by Sebastien Bubeck, working on large language models. Previously, he completed his M.S. in the Machine Learning Department at CMU, working with Ameet Talwalkar and Zack Lipton. Nicholas? research is motivated by the need to democratize machine learning and foundation models to handle the long tail of emerging ML tasks in the sciences and beyond. In order to use these models to solve high-impact problems in the sciences, his work aims to solve two main challenges: (1) determine what additional data to provide them and understand how it interacts with pretraining data, and (2) automate the process of adapting them to new problems. To address these challenges, he is focused on the intersection of data-centric ML (which aims to solve 1) and automated machine learning (AutoML) (which aims to solve 2), or more concisely data-centric AutoML. As a result of these motivating challenges, his work on developing the foundations of data-centric AutoML has a focus on diverse ML tasks that are far afield from standard ML domains. These often include problems related to solving PDEs, protein folding, climate modeling, and beyond. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Oct 17 09:47:06 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 17 Oct 2023 09:47:06 -0400 Subject: [CMU AI Seminar] October 17 at 12pm (GHC 6115 & Zoom) -- Nicholas Roberts (UW Madison) -- Geometry-Aware Adaptation for Pretrained Models -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Oct 22 16:37:39 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 22 Oct 2023 16:37:39 -0400 Subject: [CMU AI Seminar] October 24 at 12pm (NSH 3305 & Zoom) -- Tongzhou Wang (MIT) -- Quasimetric Reinforcement Learning -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (10/24)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in NSH 3305 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (10/24), *Tongzhou Wang* (MIT) will be giving a talk titled *"**Quasimetric Reinforcement Learning**"*. *Title*: Quasimetric Reinforcement Learning *Talk Abstract*: In goal-reaching agents, how are strategies for different goals related? Can we solve goal-reaching reinforcement learning (RL) with a sufficiently good representation of states and goals? In this talk, I will present a method for training high-performance optimal goal-reaching agents by learning a quasimetric geometry. This talk consists of three parts: 1. Goal-Reaching RL == _Quasimetric_ Geometry Learning. 2. How to represent this geometry? Deep quasimetric models. 3. How to learn this geometry from local transitions? A geometric argument based on quasimetric properties. *Speaker Bio:* Tongzhou is a final year PhD student at MIT, advised by Phillip Isola and Antonio Torralba. His research interests lie in structures in machine learning and artificial agents, focusing on learning structured representations for better perception and decision-making. His work spans representation learning, reinforcement learning, and machine learning. Tongzhou co-organized the Goal-Conditioned Reinforcement Learning workshop at NeurIPS 2023, bridging researchers and practitioners across machine learning and decision-making. Before his PhD study, Tongzhou received his bachelor's degree from UC Berkeley while working with Stuart Russell, Alyosha Efros and Ren Ng, and was an early member of the PyTorch team at Facebook AI Research. *In person: *NSH 3305 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Oct 22 22:51:55 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 22 Oct 2023 22:51:55 -0400 Subject: [CMU AI Seminar] Special! October 25 at *10am* (GHC 6115 & Zoom) -- Sanae Lotfi (NYU) -- Are the Marginal Likelihood and PAC-Bayes Bounds the right proxies for Generalization? -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Wednesday (10/25)* from *10**:00-11:00 AM (U.S. Eastern time)* for a special installment of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 and will be streamed on Zoom. *(Note the earlier time! ?)* To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Wednesday (10/25), *Sanae Lotfi* (NYU) will be giving a talk titled *"**Are the Marginal Likelihood and PAC-Bayes Bounds the right proxies for Generalization?**"*. *Title*: Are the Marginal Likelihood and PAC-Bayes Bounds the right proxies for Generalization? *Talk Abstract*: How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood, which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question. We first highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how the marginal likelihood can be negatively correlated with generalization and can lead to both underfitting and overfitting in hyperparameter learning. We provide a partial remedy through a conditional marginal likelihood, which we show to be more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning. PAC-Bayes bounds are another expression of Occam?s razor where simpler descriptions of the data generalize better. While there has been progress in developing tighter PAC-Bayes bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this talk, I will also present our compression approach based on quantizing neural network parameters in a linear subspace, which profoundly improves on previous results to provide state-of-the-art generalization bounds on a variety of tasks. We use these tight bounds to better understand the role of model size, equivariance, and the implicit biases of optimization for generalization in deep learning. Notably, our work shows that large models can be compressed to a much greater extent than previously known. Finally, I will discuss the connection between the marginal likelihood and PAC-Bayes bounds for model selection. *Speaker Bio:* Sanae Lotfi is a PhD student at NYU, advised by Professor Andrew Gordon Wilson. Sanae works on the foundations of deep learning. Her goal is to understand and quantify generalization in deep learning, and use this understanding to build more robust and reliable machine learning models. Sanae's PhD research has been recognized with an ICML Outstanding Paper Award and is generously supported by the Microsoft and DeepMind Fellowships, the Meta AI Mentorship Program and the NYU CDS Fellowship. Prior to joining NYU, Sanae obtained a Master?s degree in applied mathematics from Polytechnique Montreal, where she worked on designing stochastic first and second order algorithms with compelling theoretical and empirical properties for large-scale optimization. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Oct 24 10:56:30 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 24 Oct 2023 10:56:30 -0400 Subject: [CMU AI Seminar] October 24 at 12pm (NSH 3305 & Zoom) -- Tongzhou Wang (MIT) -- Quasimetric Reinforcement Learning -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: <8980C985-830A-4096-8959-EBAD8E76C28C@andrew.cmu.edu> An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Oct 24 11:52:23 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 24 Oct 2023 11:52:23 -0400 Subject: [CMU AI Seminar] October 24 at 12pm (NSH 3305 & Zoom) -- Tongzhou Wang (MIT) -- Quasimetric Reinforcement Learning -- AI Seminar sponsored by SambaNova Systems In-Reply-To: <8980C985-830A-4096-8959-EBAD8E76C28C@andrew.cmu.edu> References: <8980C985-830A-4096-8959-EBAD8E76C28C@andrew.cmu.edu> Message-ID: <0895334E-D99A-4501-9BDE-0E426332AD82@andrew.cmu.edu> An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Wed Oct 25 11:10:53 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Wed, 25 Oct 2023 11:10:53 -0400 Subject: [CMU AI Seminar] Special! October 26 at 12pm (NSH 4305 & Zoom) -- Sherry Yang (Google DeepMind) -- Foundation Models for Decision Making: Problems, Methods, and Applications -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Thursday (10/26)* from *1**2:00-1:00 PM (U.S. Eastern time)* for a special installment of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in *NSH 4305* and will be streamed on Zoom. *Note the different room! (NSH 4305)* To learn more about the seminar series or to see the future schedule, please visit the seminar website . Tomorrow, this Thursday (10/26), *Sherry Yang* (Google DeepMind / UC Berkeley) will be giving a talk titled *"**Foundation Models for Decision Making: Problems, Methods, and Applications**"*. *Title*: Foundation Models for Decision Making: Problems, Methods, and Applications *Talk Abstract*: Foundation models pretrained on internet vision and language data have greatly advanced artificial intelligence in vision and language tasks. However, many tasks ranging from controlling physical systems to making scientific discoveries do not operate in image or text space while only having limited data available for learning. How to achieve behaviors better than the dataset in these broader tasks with limited data becomes a pressing challenge to today?s learning systems. In this talk, we will provide three foundation model inspired approaches including representation learning, conditional generative modeling, and repurposing pretrained vision and language models to improve learning for the broader set of tasks that involve control and decision making. *Speaker Bio:* Sherry is a final year PhD student at UC Berkeley advised by Pieter Abbeel and a senior research scientist at Google DeepMind. Her research interests include imitation learning, deep reinforcement learning, and recently foundation models for decision making. She has worked on offline policy evaluation and selection, provably beneficial representation learning for imitation, reinforcement learning for natural language generation, and generative modeling for control, planning, and decision making. Sherry has served as program committee of IJCAI and reviewer for ICML, NeurIPS, ICLR, AISTATS, and AAAI. She is the lead organizer for the Foundation Models for Decision Making workshop at NeurIPS 2022 and 2023, bringing together research communities in vision, language, planning, and reinforcement learning to solve complex decision making tasks at scale. Before her current role, Sherry received her Bachelor?s and Master?s degree from MIT advised by Patrick Winston and Julian Shun. *In person: *NSH 4305 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Thu Oct 26 10:48:54 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Thu, 26 Oct 2023 10:48:54 -0400 Subject: [CMU AI Seminar] Special! October 26 at 12pm (NSH 4305 & Zoom) -- Sherry Yang (Google DeepMind) -- Foundation Models for Decision Making: Problems, Methods, and Applications -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: Reminder that this is happening today (in a different room than usual, NSH 4305). On Wed, Oct 25, 2023 at 11:10?AM Asher Trockman wrote: > Dear all, > > We look forward to seeing you *this Thursday (10/26)* from *1**2:00-1:00 > PM (U.S. Eastern time)* for a special installment of this semester's > *CMU AI Seminar*, sponsored by SambaNova Systems . > The seminar will be held in *NSH 4305* and will be streamed on Zoom. *Note > the different room! (NSH 4305)* > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > Tomorrow, this Thursday (10/26), *Sherry Yang* (Google DeepMind / UC > Berkeley) will be giving a talk titled *"**Foundation Models for Decision > Making: Problems, Methods, and Applications**"*. > > *Title*: Foundation Models for Decision Making: Problems, Methods, and > Applications > > *Talk Abstract*: Foundation models pretrained on internet vision and > language data have greatly advanced artificial intelligence in vision and > language tasks. However, many tasks ranging from controlling physical > systems to making scientific discoveries do not operate in image or text > space while only having limited data available for learning. How to achieve > behaviors better than the dataset in these broader tasks with limited data > becomes a pressing challenge to today?s learning systems. In this talk, we > will provide three foundation model inspired approaches including > representation learning, conditional generative modeling, and repurposing > pretrained vision and language models to improve learning for the broader > set of tasks that involve control and decision making. > > *Speaker Bio:* Sherry is a final year PhD student at UC Berkeley advised > by Pieter Abbeel and a senior research scientist at Google DeepMind. Her > research interests include imitation learning, deep reinforcement learning, > and recently foundation models for decision making. She has worked on > offline policy evaluation and selection, provably beneficial representation > learning for imitation, reinforcement learning for natural language > generation, and generative modeling for control, planning, and decision > making. Sherry has served as program committee of IJCAI and reviewer for > ICML, NeurIPS, ICLR, AISTATS, and AAAI. She is the lead organizer for the > Foundation Models for Decision Making workshop at NeurIPS 2022 and 2023, > bringing together research communities in vision, language, planning, and > reinforcement learning to solve complex decision making tasks at scale. > Before her current role, Sherry received her Bachelor?s and Master?s degree > from MIT advised by Patrick Winston and Julian Shun. > > *In person: *NSH 4305 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Oct 29 17:01:09 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 29 Oct 2023 16:01:09 -0500 Subject: [CMU AI Seminar] October 31 at 12pm (GHC 6115 & Zoom) -- Abhin Shah (MIT) -- Group Fairness with Uncertainty in Sensitive Attributes -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (10/31)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. (Note: the speaker will be virtual, but we will stream the talk in the room.) To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (10/31), *Abhin Shah* (MIT) will be giving a talk titled *"**Group Fairness with Uncertainty in Sensitive Attributes**"*. *Title*: Group Fairness with Uncertainty in Sensitive Attributes *Talk Abstract*: Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty. We demonstrate that solely enforcing fairness constraints on uncertain sensitive attributes can fall significantly short in achieving the level of fairness of models trained without uncertainty. To overcome this limitation, we propose a bootstrap-based algorithm that achieves better levels of fairness despite the uncertainty in sensitive attributes. The algorithm is guided by a Gaussian analysis for the independence notion of fairness where we propose a robust quadratically constrained quadratic problem to ensure a strict fairness guarantee with uncertain sensitive attributes. Our algorithm is applicable to both discrete and continuous sensitive attributes and is effective in real-world classification and regression tasks for various group fairness notions, e.g., independence and separation. *Speaker Bio:* Abhin Shah is a final-year Ph.D. student in EECS department at MIT advised by Prof. Devavrat Shah and Prof. Greg Wornell. He is a recipient of MIT?s Jacobs Presidential Fellowship. He interned at Google Research in 2021 and at IBM Research in 2020. Prior to MIT, he graduated from IIT Bombay with a Bachelor?s degree in Electrical Engineering. His research interests include theoretical and applied aspects of trustworthy machine learning with a focus on causality and fairness. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Oct 31 11:11:40 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 31 Oct 2023 10:11:40 -0500 Subject: [CMU AI Seminar] October 31 at 12pm (GHC 6115 & Zoom) -- Abhin Shah (MIT) -- Group Fairness with Uncertainty in Sensitive Attributes -- AI Seminar sponsored by SambaNova Systems In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Wed Nov 1 13:06:08 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Wed, 1 Nov 2023 12:06:08 -0500 Subject: [CMU AI Seminar] Special! November 2 at *1pm* (NSH 3305 & Zoom) -- Larry Zitnick (FAIR) -- Modeling Atoms to Address Our Climate Crisis -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *tomorrow, this Thursday (11/2)* from *1**:00-2:00 PM (U.S. Eastern time)* for a special installment of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in NSH 3305 and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . Tomorrow (11/2), *Larry Zitnick* (FAIR) will be giving a talk titled *"**Modeling Atoms to Address Our Climate Crisis**"*. *Title*: Modeling Atoms to Address Our Climate Crisis *Talk Abstract*: Climate change is a societal and political problem whose impact could be mitigated by technology. Underlying many of its technical challenges is a surprisingly simple yet challenging problem; modeling the interaction of atoms. In this talk, we motivate the problem and provide insights into how this opens up new intriguing directions for machine learning and AI researchers. Recent large-scale datasets released by the Open Catalyst Project enable the training of ML models that generalize across a broad range of the chemical space. Analogies are drawn to computer vision to map recent state-of-the-art approaches for atomic modeling to a more familiar domain. We conclude by exploring the numerous open problems and their potential for wide ranging impact beyond climate change. *Speaker Bio:* Larry Zitnick is a research director on the Fundamental AI Research team at Meta. He is currently focused on scientific applications of AI and machine learning, such as the discovery of new catalysts for renewable energy applications. Previously, his research in computer vision covered many areas such as the FastMRI project to speed up the acquisition of MRIs, and the COCO and VQA datasets to benchmark object detection and visual language tasks. He developed the PhotoDNA technology used by industry and various law enforcement agencies to combat illegal imagery on the web. Before joining FAIR, he was a principal researcher at Microsoft Research. He received the PhD degree in robotics from Carnegie Mellon University. *In person: *NSH 3305 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sat Nov 4 14:25:15 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sat, 4 Nov 2023 13:25:15 -0500 Subject: [CMU AI Seminar] November 7 at 12pm (GHC 6115 & Zoom) -- Asher Trockman (CMU) -- Mimetic Initialization for Transformers and Convolutional Networks -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Tuesday (11/7)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (11/7), *Asher Trockman* (CMU) will be giving a talk titled *"**Mimetic Initialization for Transformers and Convolutional Networks**" *to describe a new class of initialization techniques for deep networks and how this was inspired by a very simple convolutional network architecture. *Title*: Mimetic Initialization for Transformers and Convolutional Networks *Talk Abstract*: While neural network weights are typically initialized randomly from univariate distributions, pre-trained weights often have visually-discernible multivariate structure. In recent work, we propose a technique called "mimetic initialization" that aims to replicate such structures when initializing convolutional networks and Transformers. We handcraft a class of multivariate Gaussian distribution to initialize filters for depthwise convolutional layers, and we initialize the query and key weights for self-attention layers such that their product approximates the identity. Mimetic initialization substantially reduces training time and increases final accuracy on various common benchmarks. Our technique enables us to almost close the gap between untrained and pre-trained Vision Transformers on small datasets like CIFAR-10, achieving up to a 6% gain in accuracy through initialization alone. For convolutional networks like ConvMixer and ConvNeXt, we observe improvements in accuracy and reductions in training time, even when convolutional filters are frozen (untrained) after initialization. Overall, our findings suggest that the benefits of pre-training can be separated into two components: serving as a good initialization and storing transferable knowledge, with the former being simple enough to (at least partially) capture by hand in closed-form. *Speaker Bio:* Asher Trockman is a PhD student at Carnegie Mellon University advised by Zico Kolter. He researches deep learning for vision and deep learning phenomena generally. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Fri Nov 10 16:07:30 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Fri, 10 Nov 2023 16:07:30 -0500 Subject: [CMU AI Seminar] Special! November 13 at 12pm (GHC 8102 & Zoom) -- Yilun Du (MIT) -- Learning to Generate Compositionally -- AI Seminar sponsored by SambaNova Systems Message-ID: Dear all, We look forward to seeing you *this Monday (11/13)* from *1**2:00-1:00 PM (U.S. Eastern time)* for a special installment of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 8102 *with pizza provided *and will be streamed on Zoom. *You can sign up to meet with the speaker here.* To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Monday (11/13), *Yilun Du* (MIT) will be giving a talk titled *"**Learning to Generate Compositionally**"*. *Title*: Learning to Generate Compositionally *Talk Abstract*: Generative AI has led to stunning successes in recent years but is fundamentally limited by the amount of data available. In this talk, I?ll introduce the idea of compositional generative modeling, which can help avoid this issue by building complex generative models from smaller constituent components. First, I introduce the idea of energy-based models and illustrate how they enable compositional generative modeling. I?ll then illustrate how such compositionality can enable effective generalization, both to complex visual scenes and robotic actions unseen at training time. Finally, I?ll show how such compositionality can be applied to existing large ?foundation models? to construct intelligent decision-making agents that can hierarchically plan and reason. *Speaker Bio: *Yilun Du is a final year PhD student at MIT EECS advised by Prof. Leslie Kaelbling, Prof. Tomas Lozano-Perez and Prof. Joshua B. Tenenbaum. Previously, he was a research fellow at OpenAI, and an intern and visiting researcher at FAIR and Google DeepMind. His research focuses on generative models, decision making, robot learning, 3D vision, embodied agents and the applications of such tools to scientific domains. *In person: *GHC 8102 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sat Nov 11 17:43:19 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sat, 11 Nov 2023 17:43:19 -0500 Subject: =?UTF-8?Q?=5BCMU_AI_Seminar=5D_=E2=9C=A8=F0=9F=90=A6_November_14_at_12pm_=28GHC_61?= =?UTF-8?Q?15_=26_Zoom=29_=2D=2D_Cyril_Zhang_=28MSR=29_=2D=2D_Overstepping_the_Descent_?= =?UTF-8?Q?Lemma_=2D=2D_AI_Seminar_sponsored_by_SambaNova_System?= Message-ID: Dear all, We look forward to seeing you *this Tuesday (11/14)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in GHC 6115 *with pizza provided *and will be streamed on Zoom. *? Please email me if you would like to schedule a meeting with Cyril.* To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (11/14), *Cyril Zhang* (Microsoft Research) will be giving a talk titled *"**Overstepping the Descent Lemma**"*. *Title*: Overstepping the Descent Lemma *Talk Abstract*: What are the dynamics of gradient-based algorithms for optimizing neural networks? By what principles should we design update rules for deep learning? These are extremely messy questions, to which there are no canonical answers yet. In attempting to address these mysteries with our cherished theoretical frameworks, we face a recurring theme: a tension between over-idealization and intractability. We'll discuss how asking "non-standard" questions in clean theoretical models can shed light on weird, wonderful, and empirically-pertinent nuances of the trajectory of SGD: * ? Acceleration via large steps.* By staying within the paradise of low-noise convex quadratics, we show how making negative local progress can lead to faster global convergence, via a self-stabilizing ?fractal? learning rate schedule. * ? Variance reduction without side effects.* We show how gradient stochasticity can cause catastrophic error amplification in the presence of feedback loops (like in offline RL or autoregressive language generation). Many variance reduction mechanisms help, but Polyak averaging is almost unreasonably effective; we discuss why it?s hard to analyze all these moving parts. * ? Non-convex feature learning.* By taking a close look at how deep learning overcomes a "mildly cryptographic" computational obstruction (namely, learning a sparse parity), we arrive at a clean testbed for neural representation learning. With this microscopic proxy for a single neuron?s training dynamics, mysteries such as grokking, lottery tickets, and scaling laws are recognizable and analyzable. Another recurring theme is that hard mathematical questions in this space are more clearly exposed by running targeted numerical experiments, including training deep networks on GPUs. I?ll highlight some exciting progress that other groups have made in recent months. Joint work with Naman Agarwal, Surbhi Goel, Adam Block, Dylan Foster, Akshay Krishnamurthy, Max Simchowitz, Boaz Barak, Ben Edelman, Sham Kakade, and Eran Malach. *Speaker Bio:* Cyril Zhang is a Senior Researcher at Microsoft Research NYC. He has worked on learning and control in dynamical systems, online & stochastic optimization, and (most recently) a nascent theoretical, scientific, and algorithmic toolbox for neural reasoning. He holds a Ph.D. in Computer Science from Princeton University. *In person: *GHC 6115 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Tue Nov 14 09:11:56 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Tue, 14 Nov 2023 09:11:56 -0500 Subject: =?UTF-8?Q?Re=3A_=5BCMU_AI_Seminar=5D_=E2=9C=A8=F0=9F=90=A6_November_14_at_12pm_=28GH?= =?UTF-8?Q?C_6115_=26_Zoom=29_=2D=2D_Cyril_Zhang_=28MSR=29_=2D=2D_Overstepping_the_Desc?= =?UTF-8?Q?ent_Lemma_=2D=2D_AI_Seminar_sponsored_by_SambaNova_System?= In-Reply-To: References: Message-ID: Reminder this is happening today! On Sat, Nov 11, 2023 at 5:43?PM Asher Trockman wrote: > Dear all, > > We look forward to seeing you *this Tuesday (11/14)* from *1**2:00-1:00 > PM (U.S. Eastern time)* for the next talk of this semester's > *CMU AI Seminar*, sponsored by SambaNova Systems . > The seminar will be held in GHC 6115 *with pizza provided *and will be > streamed on Zoom. > > *? Please email me if you would like to schedule a meeting with Cyril.* > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > On this Tuesday (11/14), *Cyril Zhang* (Microsoft Research) will be > giving a talk titled *"**Overstepping the Descent Lemma**"*. > > *Title*: Overstepping the Descent Lemma > > *Talk Abstract*: What are the dynamics of gradient-based algorithms for > optimizing neural networks? By what principles should we design update > rules for deep learning? These are extremely messy questions, to which > there are no canonical answers yet. In attempting to address these > mysteries with our cherished theoretical frameworks, we face a recurring > theme: a tension between over-idealization and intractability. We'll > discuss how asking "non-standard" questions in clean theoretical models can > shed light on weird, wonderful, and empirically-pertinent nuances of the > trajectory of SGD: > > * ? Acceleration via large steps.* By staying within the paradise of > low-noise convex quadratics, we show how making negative local progress can > lead to faster global convergence, via a self-stabilizing ?fractal? > learning rate schedule. > * ? Variance reduction without side effects.* We show how gradient > stochasticity can cause catastrophic error amplification in the presence of > feedback loops (like in offline RL or autoregressive language generation). > Many variance reduction mechanisms help, but Polyak averaging is almost > unreasonably effective; we discuss why it?s hard to analyze all these > moving parts. > * ? Non-convex feature learning.* By taking a close look at how deep > learning overcomes a "mildly cryptographic" computational obstruction > (namely, learning a sparse parity), we arrive at a clean testbed for neural > representation learning. With this microscopic proxy for a single neuron?s > training dynamics, mysteries such as grokking, lottery tickets, and scaling > laws are recognizable and analyzable. > > Another recurring theme is that hard mathematical questions in this space > are more clearly exposed by running targeted numerical experiments, > including training deep networks on GPUs. I?ll highlight some exciting > progress that other groups have made in recent months. > > Joint work with Naman Agarwal, Surbhi Goel, Adam Block, Dylan Foster, > Akshay Krishnamurthy, Max Simchowitz, Boaz Barak, Ben Edelman, Sham Kakade, > and Eran Malach. > > *Speaker Bio:* Cyril Zhang is a Senior > Researcher at Microsoft Research NYC. He has worked on learning and control > in dynamical systems, online & stochastic optimization, and (most recently) > a nascent theoretical, scientific, and algorithmic toolbox for neural > reasoning. He holds a Ph.D. in Computer Science from Princeton University. > > *In person: *GHC 6115 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Sun Dec 3 17:40:09 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Sun, 3 Dec 2023 17:40:09 -0500 Subject: [CMU AI Seminar] December 5 at 12pm (NSH 3305 & Zoom) -- Elan Rosenfeld (CMU) -- Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization -- AI Seminar sponsored by SambaNova System Message-ID: Dear all, We look forward to seeing you *this Tuesday (12/5)* from *1**2:00-1:00 PM (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, sponsored by SambaNova Systems . The seminar will be held in NSH 3305 *with pizza provided *and will be streamed on Zoom. To learn more about the seminar series or to see the future schedule, please visit the seminar website . On this Tuesday (12/5), *Elan Rosenfeld* (CMU) will be giving a talk titled *"**Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization**"*. *Title*: Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization *Talk Abstract*: There is a growing list of intriguing properties of neural network optimization, including specific patterns in their training dynamics (e.g. simplicity bias, edge of stability, grokking) and the unexplained effectiveness of various tools (e.g. batch normalization, SAM, Adam). Extensive study of these properties has so far yielded only a partial understanding of their origins?and their relation to one another is even less clear. What is it about gradient descent on neural networks that gives rise to these phenomena? In this talk, I will present our recent experiments which offer a new perspective on many of these findings and suggest that they may have a shared underlying cause. Our investigation identifies and explores the significant influence of paired groups of outliers with what we call Opposing Signals: large magnitude features that dominate the network?s output throughout most of training and cause large gradients pointing in opposite directions. Though our experiments shed some light on these outliers? influence, we lack a complete understanding of their precise effect on network training dynamics. Instead, I?ll share our working hypothesis via a high-level explanation, and I?ll describe initial experiments which verify some of its qualitative predictions. We hope a deeper understanding of this phenomenon will enable future principled improvements to neural network optimization. *Speaker Bio:* Elan Rosenfeld is a final year PhD student in CMU MLD advised by Profs. Andrej Risteski and Pradeep Ravikumar. His research focuses on principled approaches to understanding and improving robustness, representation learning, and generalization in deep learning. *In person: *NSH 3305 *Zoom Link*: https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 Thanks, Asher Trockman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at andrew.cmu.edu Tue Dec 5 11:49:00 2023 From: ashert at andrew.cmu.edu (Asher Trockman) Date: Tue, 5 Dec 2023 11:49:00 -0500 Subject: [CMU AI Seminar] December 5 at 12pm (NSH 3305 & Zoom) -- Elan Rosenfeld (CMU) -- Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization -- AI Seminar sponsored by SambaNova System In-Reply-To: References: Message-ID: <663FD6E2-E58A-4922-8D73-A837D8DCF94B@andrew.cmu.edu> An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Tue Dec 5 11:57:43 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Tue, 5 Dec 2023 11:57:43 -0500 Subject: [CMU AI Seminar] December 5 at 12pm (NSH 3305 & Zoom) -- Elan Rosenfeld (CMU) -- Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization -- AI Seminar sponsored by SambaNova System In-Reply-To: <663FD6E2-E58A-4922-8D73-A837D8DCF94B@andrew.cmu.edu> References: <663FD6E2-E58A-4922-8D73-A837D8DCF94B@andrew.cmu.edu> Message-ID: NOTE: In NSH 3305. On Tue, Dec 5, 2023 at 11:49?AM Asher Trockman wrote: > Reminder this is happening soon! > > On Dec 3, 2023, at 5:40 PM, Asher Trockman wrote: > > ? > Dear all, > > We look forward to seeing you *this Tuesday (12/5)* from *1**2:00-1:00 PM > (U.S. Eastern time)* for the next talk of this semester's *CMU AI Seminar*, > sponsored by SambaNova Systems . The seminar will > be held in NSH 3305 *with pizza provided *and will be streamed on Zoom. > > To learn more about the seminar series or to see the future schedule, > please visit the seminar website . > > On this Tuesday (12/5), *Elan Rosenfeld* (CMU) will be giving a talk > titled *"**Outliers with Opposing Signals Have an Outsized Effect on > Neural Network Optimization**"*. > > *Title*: Outliers with Opposing Signals Have an Outsized Effect on Neural > Network Optimization > > *Talk Abstract*: There is a growing list of intriguing properties of > neural network optimization, including specific patterns in their training > dynamics (e.g. simplicity bias, edge of stability, grokking) and the > unexplained effectiveness of various tools (e.g. batch normalization, SAM, > Adam). Extensive study of these properties has so far yielded only a > partial understanding of their origins?and their relation to one another is > even less clear. What is it about gradient descent on neural networks that > gives rise to these phenomena? > > In this talk, I will present our recent experiments which offer a new > perspective on many of these findings and suggest that they may have a > shared underlying cause. Our investigation identifies and explores the > significant influence of paired groups of outliers with what we call > Opposing Signals: large magnitude features that dominate the network?s > output throughout most of training and cause large gradients pointing in > opposite directions. > > Though our experiments shed some light on these outliers? influence, we > lack a complete understanding of their precise effect on network training > dynamics. Instead, I?ll share our working hypothesis via a high-level > explanation, and I?ll describe initial experiments which verify some of its > qualitative predictions. We hope a deeper understanding of this phenomenon > will enable future principled improvements to neural network optimization. > > *Speaker Bio:* Elan Rosenfeld is a final > year PhD student in CMU MLD advised by Profs. Andrej Risteski and Pradeep > Ravikumar. His research focuses on principled approaches to understanding > and improving robustness, representation learning, and generalization in > deep learning. > > *In person: *NSH 3305 > *Zoom Link*: > https://cmu.zoom.us/j/99510233317?pwd=ZGx4aExNZ1FNaGY4SHI3Qlh0YjNWUT09 > > Thanks, > Asher Trockman > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashert at cs.cmu.edu Tue Dec 5 13:13:54 2023 From: ashert at cs.cmu.edu (Asher Trockman) Date: Tue, 5 Dec 2023 13:13:54 -0500 Subject: Fwd: S3D Seminar Series: Yi Wu - Friday 12/8 at 12pm - Language Model meets Reinforcement Learning: Building Strong Language Agents for Strategic Gameplay In-Reply-To: References: <3aa10982dcb0396dc68e05fbe3ef55ad@mail.gmail.com> Message-ID: ---------- Forwarded message --------- From: Linda Campbell Date: Thu, Nov 30, 2023 at 9:57?AM Subject: S3D Seminar Series: Yi Wu - Friday 12/8 To: S3D Seminar Series Friday, December 8, 2023 12:00 p.m. ? 1:15 p.m. Google calendar In person at TCS 358 Or online via Zoom: https://cmu.zoom.us/j/93910552516?pwd=S1hRUXhJendVYzBEUC94dlNvd0Y2UT09 *Lunch provided starting at 11:45 a.m.* *Title: * Language Model meets Reinforcement Learning: Building Strong Language Agents for Strategic Gameplay *Speaker: *Yi Wu, Assistant Professor, Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University *Abstract:* Thanks to the advances in large language models (LLM), there has been a recent trend to develop intelligent language agents for complex tasks. Most existing applications of language agents are purely LLM-based, i.e., by directly prompting LLMs to output actions. Although interesting emergent behaviors can be observed, their performances in complex multi-agent games can be limited due to the lack of domain-specific training. This talk will cover some recent projects in my group on developing language agents that can both yield strong gameplay performances and cooperate with real human players in challenging multi-agent games. The key idea is to combine language modeling and reinforcement learning. The language model will serve as an interface for reasoning and interpreting high-level commands while reinforcement learning helps substantially improve the gameplay performance of the agent. We demonstrate our agents in three domains, including an agent that can follow high-level commands to play a real-time strategy game, an Overcooked agent that can cooperate with humans via languages to cook dishes, and an agent that outperforms average human players in the Werewolf game. *Bio: * Yi Wu is now an assistant professor at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University. He obtained his Ph.D. degree from UC Berkeley in 2019 under the supervision of Prof. Stuart Russell. Before moving back to Tsinghua, Yi was a full-time researcher at OpenAI. His research focuses on improving the generalization capabilities of learning agents. He is broadly interested in a variety of topics in AI, including multi-agent reinforcement learning, human-AI interaction, language grounding, and robot learning. His representative works include the MADDPG/MAPPO algorithm, OpenAI's hide-and-seek project, and the value iteration network, which won the best paper award at NIPS 2016. *Upcoming S3D Seminar Series Talks* February 7: George Fairbanks* TBD: Yesemin Acar* April 17: Mani Srivastava* April 24: Premkumar Devanbu* May 1: David Rand* (joint with CSS) *indicates part of the Distinguished Speakers Series -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 1284071 bytes Desc: not available URL: