Fwd: RI Ph.D. Thesis Proposal: Brian Yang

Fri Sep 6 09:51:28 EDT 2024

Brian's thesis proposal is starting in 10 minutes!  Please come by and 
hear about diffusion models, language models, and self-driving cars!

-------- Forwarded Message --------
Subject: 	Re: RI Ph.D. Thesis Proposal: Brian Yang
Date: 	Fri, 6 Sep 2024 08:53:35 -0400
From: 	Brian Yang <brianyan at andrew.cmu.edu>
To: 	Suzanne Muth <lyonsmuth at cmu.edu>
CC: 	RI People <ri-people at andrew.cmu.edu>

Reminder that this is happening today at 10am in NSH 4305**and zoom: 
https://cmu.zoom.us/j/93129020623?pwd=zMN1mXaUgsju0ORfKMZLFzhzdw6QDR.1 
<https://cmu.zoom.us/j/93129020623?pwd=zMN1mXaUgsju0ORfKMZLFzhzdw6QDR.1>

On Thu, Aug 29, 2024 at 8:37 AM Suzanne Muth <lyonsmuth at cmu.edu 
<mailto:lyonsmuth at cmu.edu>> wrote:

     *Date:* 06 September 2024
     *Time:* 10:00 a.m. (ET)
     *Location:* NSH 4305
     *Zoom Link:*
     https://cmu.zoom.us/j/93129020623?pwd=zMN1mXaUgsju0ORfKMZLFzhzdw6QDR.1

<https://cmu.zoom.us/j/93129020623?pwd=zMN1mXaUgsju0ORfKMZLFzhzdw6QDR.1>
     *Type:* Ph.D. Thesis Proposal
     *Who:* Brian Yang
     *Title:* Teaching Robots to Drive: Scalable Policy Improvement via
     Human Feedback

     *Abstract:*
     A long-standing problem in autonomous driving is grappling with the
     long-tail of rare scenarios for which little or no data is
     available. Although learning-based methods scale with data, it is
     unclear that simply ramping up data collection will eventually make
     this problem go away. Approaches which rely on simulation or world
     modeling offer some relief, but building such models is very
     challenging and in itself an active area of research.

     On the other hand, humans can learn to drive without millions of
     logged driving miles or the ability to precisely predict the
     trajectories of all dynamic actors in the scene. This suggests a
     potential alternative path to learning robust driving policies which
     does not rely on highly accurate world models or enormous driving
     datasets -- one which leans into human preferences and expertise as
     an untapped source of supervision for training driving policies.

     This thesis aims to make the case for human feedback as a rich
     signal for improving driving policies in a sample efficient manner
     without requiring high fidelity simulation. First, we propose a
     method for guiding driving policies at test-time using unseen
     black-box reward functions. We can then synthesize reward functions
     using natural language and optimize them online, allowing us to
     solve novel tasks zero-shot using only language supervision. Next,
     we show how driving policies can be fine-tuned offline using human
     preference data. By eliciting preferences over high-level intents,
     we can use human feedback to effectively relabel sub-optimal driving
     demonstrations and improve on-road driving performance. As future
     work, we aim to combine these two methods to finetune driving
     policies offline using natural language corrections, which should
     enable richer feedback over longer horizons and chain-of-thought
     distillation.

     *Thesis Committee Members:*
     Katerina Fragkiadaki, Co-chair
     Jeff Schneider, Co-chair
     Maxim Likhachev
     Philipp Krähenbühl, The University of Texas at Austin

     A draft of the thesis proposal document is available here

<https://drive.google.com/file/d/1n4mehD6jWXFZp3-ksjIecbxU7mIXbcxB/view?usp=sharing>.