Fwd: RI Ph.D. Thesis Defense: Benedikt Boecking

Artur Dubrawski awd at cs.cmu.edu
Mon Nov 28 12:28:30 EST 2022


Please join Ben on his Big Day and see a really cool talk he is going to
give.

Cheers
Artur


---------- Forwarded message ---------
From: Suzanne Muth <lyonsmuth at cmu.edu>
Date: Mon, Nov 28, 2022 at 11:54 AM
Subject: RI Ph.D. Thesis Defense: Benedikt Boecking
To: <ri-people at lists.andrew.cmu.edu>


Date: 07 December 2022
Time: 1:00 p.m. (ET)
Location: NSH 4305
Zoom Link:
https://cmu.zoom.us/j/96368686155?pwd=Zm9abDRRYWNJUkNqU2pIZmEvM0hpQT09
Type: Ph.D. Thesis Defense
Who: Benedikt Boecking
Title: Learning with Diverse Forms of Imperfect and Indirect Supervision

Abstract:
Powerful Machine Learning (ML) models trained on large, annotated datasets
have driven impressive advances in fields including natural language
processing and computer vision. In turn, such developments have led to
impactful applications of ML in areas such as healthcare, e-commerce, and
predictive maintenance. However, obtaining annotated datasets at the scale
required for training high capacity ML models is frequently a bottleneck
for promising applications of ML. In this thesis, I study alternative
pathways for acquiring domain knowledge and develop methodologies to enable
learning from weak supervision, i.e., imperfect and indirect forms of
supervision. I cover three forms of weak supervision: pairwise linkage
feedback, programmatic weak supervision, and paired multi-modal data. These
forms of information are often easy to obtain at scale, and the methods I
develop reduce--and in some cases eliminate--the need for pointillistic
ground truth annotations.

I begin by studying the utility of pairwise supervision. I introduce a new
constrained clustering method which uses small amounts of
pairwise constraints to simultaneously learn a kernel and cluster data. The
method outperforms related approaches on a large and diverse group
of publicly available datasets. Next, I introduce imperfect pairwise
supervision to programmatic weak supervision label models. I
show empirically that just one source of weak pairwise feedback can lead to
significantly improved downstream performance.

I then further the study of programmatic data labeling methods by
introducing approaches that model the distribution of inputs in concert
with weak labels. I first introduce a framework for joint learning of a
label and end model on the basis of observed weak labels,
showing improvements over prior work in terms of end model performance on
downstream test sets. Next, I introduce a method that fuses
generative adversarial networks and programmatic weak supervision label
models to the benefit of both, measured by label model performance and
data generation quality.

In the last part of this thesis, I tackle a central challenge in
programmatic weak supervision: the need for experts to provide labeling
rules. First, I introduce an interactive learning framework that aids users
in discovering weak supervision sources to capture subject matter
experts’ knowledge of the application domain in an efficient fashion. I
then study the opportunity of dispensing with labeling functions altogether
by learning from unstructured natural language descriptions directly. In
particular, I study how biomedical text paired with images can be exploited
for self-supervised vision--language processing, yielding data-efficient
representations and enabling zero-shot classification, without requiring
experts to define rules on the text or images.

Together, these works provide novel methodologies and frameworks to encode
and use expert domain knowledge more efficiently in ML models, reducing the
bottleneck created by the need for manual ground truth annotations.

Thesis Committee Members:
Artur Dubrawski, Chair
Jeff Schneider
Barnabas Poczos
Hoifung Poon, Microsoft Research

A draft of the thesis defense document is available at:
https://drive.google.com/file/d/17DB_6gkfH7LPVzkt0adS0-O58pg_RSmE/view?usp=sharing

_______________________________________________
ri-people mailing list
ri-people at lists.andrew.cmu.edu
https://lists.andrew.cmu.edu/mailman/listinfo/ri-people
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20221128/27f0608c/attachment.html>


More information about the Autonlab-users mailing list