Fwd: RI Ph.D. Thesis Proposal: Brian Yang
Jeff Schneider
jeff4 at andrew.cmu.edu
Fri Sep 6 09:51:28 EDT 2024
Brian's thesis proposal is starting in 10 minutes! Please come by and
hear about diffusion models, language models, and self-driving cars!
-------- Forwarded Message --------
Subject: Re: RI Ph.D. Thesis Proposal: Brian Yang
Date: Fri, 6 Sep 2024 08:53:35 -0400
From: Brian Yang <brianyan at andrew.cmu.edu>
To: Suzanne Muth <lyonsmuth at cmu.edu>
CC: RI People <ri-people at andrew.cmu.edu>
Reminder that this is happening today at 10am in NSH 4305**and zoom:
https://cmu.zoom.us/j/93129020623?pwd=zMN1mXaUgsju0ORfKMZLFzhzdw6QDR.1
<https://cmu.zoom.us/j/93129020623?pwd=zMN1mXaUgsju0ORfKMZLFzhzdw6QDR.1>
On Thu, Aug 29, 2024 at 8:37 AM Suzanne Muth <lyonsmuth at cmu.edu
<mailto:lyonsmuth at cmu.edu>> wrote:
*Date:* 06 September 2024
*Time:* 10:00 a.m. (ET)
*Location:* NSH 4305
*Zoom Link:*
https://cmu.zoom.us/j/93129020623?pwd=zMN1mXaUgsju0ORfKMZLFzhzdw6QDR.1
<https://cmu.zoom.us/j/93129020623?pwd=zMN1mXaUgsju0ORfKMZLFzhzdw6QDR.1>
*Type:* Ph.D. Thesis Proposal
*Who:* Brian Yang
*Title:* Teaching Robots to Drive: Scalable Policy Improvement via
Human Feedback
*Abstract:*
A long-standing problem in autonomous driving is grappling with the
long-tail of rare scenarios for which little or no data is
available. Although learning-based methods scale with data, it is
unclear that simply ramping up data collection will eventually make
this problem go away. Approaches which rely on simulation or world
modeling offer some relief, but building such models is very
challenging and in itself an active area of research.
On the other hand, humans can learn to drive without millions of
logged driving miles or the ability to precisely predict the
trajectories of all dynamic actors in the scene. This suggests a
potential alternative path to learning robust driving policies which
does not rely on highly accurate world models or enormous driving
datasets -- one which leans into human preferences and expertise as
an untapped source of supervision for training driving policies.
This thesis aims to make the case for human feedback as a rich
signal for improving driving policies in a sample efficient manner
without requiring high fidelity simulation. First, we propose a
method for guiding driving policies at test-time using unseen
black-box reward functions. We can then synthesize reward functions
using natural language and optimize them online, allowing us to
solve novel tasks zero-shot using only language supervision. Next,
we show how driving policies can be fine-tuned offline using human
preference data. By eliciting preferences over high-level intents,
we can use human feedback to effectively relabel sub-optimal driving
demonstrations and improve on-road driving performance. As future
work, we aim to combine these two methods to finetune driving
policies offline using natural language corrections, which should
enable richer feedback over longer horizons and chain-of-thought
distillation.
*Thesis Committee Members:*
Katerina Fragkiadaki, Co-chair
Jeff Schneider, Co-chair
Maxim Likhachev
Philipp Krähenbühl, The University of Texas at Austin
A draft of the thesis proposal document is available here
<https://drive.google.com/file/d/1n4mehD6jWXFZp3-ksjIecbxU7mIXbcxB/view?usp=sharing>.
More information about the Autonlab-users
mailing list