From awd at cs.cmu.edu Fri Apr 3 06:47:25 2026 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 3 Apr 2026 06:47:25 -0400 Subject: If you do multimodal or vision AI you should check this out Message-ID: https://x.com/heygurisingh/status/2039012548260082082?s=20 -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Fri Apr 3 18:34:08 2026 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 3 Apr 2026 18:34:08 -0400 Subject: Fwd: If you do multimodal or vision AI you should check this out In-Reply-To: References: Message-ID: Sharing Szymon's insights as they have a broader appeal I think. Also, the paper confirms our much earlier observation that AI benchmarks do not measure up to the models they are supposed to assess. That's why we have invested time and effort to develop benchmarking *frameworks* that would let us dynamically generate new benchmarks that would hopefully be able to stay ahead of the capabilities of the AI technology as it continues to evolve. Basically, putting the horses in front of the carriage again. Big thanks to TimeSeriesGym and TimeSeriesExamAgent teams for spearheading these efforts here at the Auton Lab! Cheers, Artur PS It is hard to blame an AI model for accomplishing their tasks with whatever we give them. It was always the case in ML that we should be careful about how we (or our AI agents these days) test the models properly, to make sure they are doing their things in the ways we expect them to do. ---------- Forwarded message --------- From: Szymon Rusiecki Date: Fri, Apr 3, 2026 at 9:57?AM Subject: Re: If you do multimodal or vision AI you should check this out To: Artur Dubrawski After reproducing methodology presented in their paper, the mirage issue occurs only for ?big? models. The ?small? ones often don?t have this issue. On Fri, Apr 3, 2026 at 15:41 Szymon Rusiecki wrote: > I am actually surprised as I recently broke my collarbone so I decided to > test on Gemini 3 flash with OOD sample (I think Google doesn?t have an > image from my iPhone and even if, the photo doesn?t have any description) > with prompt ?what do you see on this image?? and it responded with the same > answer as my doctor. > > SR > > On Fri, Apr 3, 2026 at 12:48 Artur Dubrawski wrote: > >> https://x.com/heygurisingh/status/2039012548260082082?s=20 >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Tue Apr 7 14:48:36 2026 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 7 Apr 2026 14:48:36 -0400 Subject: Fwd: RI PhD Thesis Proposal - Xinyu (Rachel) Li In-Reply-To: References: Message-ID: Team, Please come and witness Rachel giving her excellent thesis proposal presentation on Monday next week! Cheers, Artur ---------- Forwarded message --------- From: RI PhD Program Manager Date: Tue, Apr 7, 2026 at 2:45?PM Subject: RI PhD Thesis Proposal - Xinyu (Rachel) Li To: RI People *RI CALENDAR EVENT * *Date: April 13, 2026Time: 03:15 PM (ET) Location: GHC 6121Zoom Link * *Type: Ph.D. Thesis Proposal* * Who: Xinyu (Rachel) Li* *Title: Towards Accessible AI Agents* *Abstract:* Empowered by large language models (LLMs), AI agents have shown strong potential across tasks such as general-purpose assistance, software coding, and scientific research. However, their practical utility in applications involving consequential decisions such as healthcare, remains constrained by three major challenges. *Evaluation.* Existing agent evaluations often focus on well-structured tasks and final outcomes, failing to fully capture the complexity of real-world workflows. We propose evaluation frameworks grounded in realistic machine learning engineering workflows, providing skill-based, multi-artifact, and holistic assessments that systematically evaluate the practical utility of AI agents. *Learning.* Improving LLMs for agentic use typically relies on reinforcement learning with large amounts of high-quality labeled data, which are costly and difficult to obtain in expert domains including healthcare. To address this limitation, we aim to develop learning frameworks that require minimal external supervision, improving the scalability and efficiency of agent learning. *Specialization.* AI agents typically follow a one-size-fits-all paradigm at the time of deployment, lacking mechanisms to account for task-specific or user-specific requirements. We propose methods that enable agent specialization for downstream tasks and users, expanding their applicability across heterogeneous deployment settings. This thesis aims to make AI agents more broadly accessible and impactful in important real-world applications by enhancing their practical utility, making them more measurable, more capable, and better tailored to the needs of their users and applications. *Link to thesis* *Thesis committee members:* Artur Dubrawski (Chair) Andrea Bajcsy Barnab?s P?czos Daniel McDuff (Google) -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Mon Apr 27 14:09:19 2026 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 27 Apr 2026 14:09:19 -0400 Subject: Mark your calendars! [RI PhD Thesis Proposal - Willa Potosnak] Message-ID: Willa will be giving her proposal presentation on Monday next week during our usual brainstorming session time slot, but in Gates Hall. Please come and join. Intellectual fun guaranteed! Cheers, Artur ---------- Forwarded message --------- From: RI PhD Program Manager Date: Mon, Apr 27, 2026, 1:52?PM Subject: RI PhD Thesis Proposal - Willa Potosnak To: RI People *RI EVENT CALENDAR* *Date: *May 4th, 2026 *Time: *4:30 PM-6:00 PM *Location*: GHC 4405 *Zoom Link* *Type: *RI PhD Thesis Proposal *Who: *Willa Potosnak *Title:* Forecasting at Scale with Efficient Deep Learning Architectures *Abstract:* Time Series Foundation Models (TSFMs) have scaled rapidly, with publicly reported pretraining corpora growing from 1.23 billion to 1 trillion data points between 2024 and 2026, an approximately 800? increase in two years. Recent work has further supplemented real-world data with synthetic data to expose models to broader time series patterns. Yet, this data-centric paradigm raises a fundamental question: *must intelligent forecasting rely solely on scale, or can intentional architectural design unlock better generalization? *This thesis proposes that more intelligently and efficiently leveraging existing data, rather than scale alone, is key to achieving better forecasting generalization. We pursue this through three parallel architectural themes: exploiting cross-channel structure beyond temporal patterns, enabling zero-shot generalization through structured composition, and reducing gradient and forecast variance by design. Each theme aims to enhance generalization with available data while treating computational efficiency as a core design principle. In this thesis, we demonstrate that scale is not the only path to generalization by: developing multivariate architectures that leverage cross-channel dependencies efficiently while reducing forecast error; showing that architectures can generalize beyond their training distribution in both patterns and concepts; and verifying variance-aware architectural designs that extract richer training signals from existing data, provably reducing gradient variance while reducing forecast error and improving calibration. Within the first theme, we further propose pretraining strategies for multivariate TSFMs to investigate whether data balancing and curriculum learning can improve downstream generalization given the same pretraining corpora. Within the second theme, we propose an additional dimension of generalization, extending beyond pattern and concept generalization to horizon generalization, an important consideration for TSFMs applied across diverse tasks and domains. Overall, this work contributes new insights into advancing time series forecasting generalization through efficient architectural design. *Committee:* Artur Dubrawski, Chair John Dolan Barnab?s P?czos Michael W. Mahoney (University of California, Berkeley) *Thesis Link* -------------- next part -------------- An HTML attachment was scrubbed... URL: