REMINDER: RI Ph.D. Thesis Defense: Benedikt Boecking

Artur Dubrawski awd at cs.cmu.edu
Wed Dec 7 08:55:35 EST 2022


This is happening today at 1pm in a hybrid mode (Ben will be there in
person).

See you there!
Artur

On Mon, Nov 28, 2022 at 12:28 PM Artur Dubrawski <awd at cs.cmu.edu> wrote:

> Please join Ben on his Big Day and see a really cool talk he is going to
> give.
>
> Cheers
> Artur
>
>
> ---------- Forwarded message ---------
> From: Suzanne Muth <lyonsmuth at cmu.edu>
> Date: Mon, Nov 28, 2022 at 11:54 AM
> Subject: RI Ph.D. Thesis Defense: Benedikt Boecking
> To: <ri-people at lists.andrew.cmu.edu>
>
>
> Date: 07 December 2022
> Time: 1:00 p.m. (ET)
> Location: NSH 4305
> Zoom Link:
> https://cmu.zoom.us/j/96368686155?pwd=Zm9abDRRYWNJUkNqU2pIZmEvM0hpQT09
> Type: Ph.D. Thesis Defense
> Who: Benedikt Boecking
> Title: Learning with Diverse Forms of Imperfect and Indirect Supervision
>
> Abstract:
> Powerful Machine Learning (ML) models trained on large, annotated datasets
> have driven impressive advances in fields including natural language
> processing and computer vision. In turn, such developments have led to
> impactful applications of ML in areas such as healthcare, e-commerce, and
> predictive maintenance. However, obtaining annotated datasets at the scale
> required for training high capacity ML models is frequently a bottleneck
> for promising applications of ML. In this thesis, I study alternative
> pathways for acquiring domain knowledge and develop methodologies to enable
> learning from weak supervision, i.e., imperfect and indirect forms of
> supervision. I cover three forms of weak supervision: pairwise linkage
> feedback, programmatic weak supervision, and paired multi-modal data. These
> forms of information are often easy to obtain at scale, and the methods I
> develop reduce--and in some cases eliminate--the need for pointillistic
> ground truth annotations.
>
> I begin by studying the utility of pairwise supervision. I introduce a new
> constrained clustering method which uses small amounts of
> pairwise constraints to simultaneously learn a kernel and cluster data. The
> method outperforms related approaches on a large and diverse group
> of publicly available datasets. Next, I introduce imperfect pairwise
> supervision to programmatic weak supervision label models. I
> show empirically that just one source of weak pairwise feedback can lead to
> significantly improved downstream performance.
>
> I then further the study of programmatic data labeling methods by
> introducing approaches that model the distribution of inputs in concert
> with weak labels. I first introduce a framework for joint learning of a
> label and end model on the basis of observed weak labels,
> showing improvements over prior work in terms of end model performance on
> downstream test sets. Next, I introduce a method that fuses
> generative adversarial networks and programmatic weak supervision label
> models to the benefit of both, measured by label model performance and
> data generation quality.
>
> In the last part of this thesis, I tackle a central challenge in
> programmatic weak supervision: the need for experts to provide labeling
> rules. First, I introduce an interactive learning framework that aids users
> in discovering weak supervision sources to capture subject matter
> experts’ knowledge of the application domain in an efficient fashion. I
> then study the opportunity of dispensing with labeling functions altogether
> by learning from unstructured natural language descriptions directly. In
> particular, I study how biomedical text paired with images can be exploited
> for self-supervised vision--language processing, yielding data-efficient
> representations and enabling zero-shot classification, without requiring
> experts to define rules on the text or images.
>
> Together, these works provide novel methodologies and frameworks to encode
> and use expert domain knowledge more efficiently in ML models, reducing the
> bottleneck created by the need for manual ground truth annotations.
>
> Thesis Committee Members:
> Artur Dubrawski, Chair
> Jeff Schneider
> Barnabas Poczos
> Hoifung Poon, Microsoft Research
>
> A draft of the thesis defense document is available at:
>
> https://drive.google.com/file/d/17DB_6gkfH7LPVzkt0adS0-O58pg_RSmE/view?usp=sharing
>
> _______________________________________________
> ri-people mailing list
> ri-people at lists.andrew.cmu.edu
> https://lists.andrew.cmu.edu/mailman/listinfo/ri-people
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20221207/b891f7fa/attachment.html>


More information about the Autonlab-users mailing list