Fwd: Thesis Proposal - Dec. 16, 2022 - Sebastian Caldas - Collaborative learning by leveraging siloed data

Mon Dec 5 14:59:28 EST 2022

An important and ingenious talk ahead of us!

Artur

---------- Forwarded message ---------
From: Diane Stidle <stidle at andrew.cmu.edu>
Date: Mon, Dec 5, 2022 at 2:33 PM
Subject: Thesis Proposal - Dec. 16, 2022 - Sebastian Caldas - Collaborative
learning by leveraging siloed data
To: ml-seminar at cs.cmu.edu <ML-SEMINAR at cs.cmu.edu>, <cler at pitt.edu>, <
martin.jaggi at epfl.ch>

*Thesis Proposal*

Date: December 16, 2022
Time: 10:30am (EST)
Remote Only
Speaker: Sebastian Caldas

*Title: Collaborative learning by leveraging siloed data*

Abstract:

Data holders cannot always share the data that they own, which can
ultimately limit the modeling capabilities of each holder. For example, a
hospital may lack representative records to learn about a new or rare
condition, or a single mobile device may not have enough input to train a
useful language model about its user. In both of these cases, these siloed
data holders would benefit from collaborating with others in order to
leverage their data.

In recent years, the field of federated learning has taken an interest in
learning performant collaborative models from siloed data. For these models
to be truly useful, however, they must provide utility along dimensions
beyond predictive performance, such as confidentiality, fairness and
privacy. In this thesis proposal, I will demonstrate how to improve the
utility of collaborative models that leverage siloed data, focusing on
three dimensions of utility that are of current relevance to collaborative
contexts:

Explanations: We combine explanations with predictive performance in
pursuit of true clinical utility. To this end, we introduced FRCLS, an
algorithm that can explicitly identify when a prediction is using knowledge
from an external collaborator, and provides interpretable rules that
delineate subpopulations for which that external knowledge is useful. We
have demonstrated the efficacy of FRCLS on a variety of clinical tasks
including early prediction of sepsis and prediction of overly long lengths
of stay.

Expert supervision: We encode domain knowledge into on-device data,
enabling collaborative learning for a wider variety of problems. We encode
this knowledge by leveraging heuristics curated by experts. We first learn
which heuristics will be useful for the devices’ data and then train a
weakly supervised federated model using these heuristics.

Communication constraints: To complete my dissertation, I propose to study
settings where collaborators are limited in the number of rounds of
communication that can be exchanged, as is seen in clinical settings with
limited infrastructure. I propose to develop an adaptive knowledge
distillation strategy and to demonstrate it in a healthcare application
context.

*Thesis Committee:*
Artur Dubrawski (chair)
Virginia Smith
Gilles Clermont (University of Pittsburgh)
Martin Jaggi (EPFL)

*Zoom meeting link:*
https://cmu.zoom.us/j/94957077221?pwd=SkdHZitvNkR2Zm9lSXMyUGtPUldjQT09

*Link to the draft document: *
https://drive.google.com/file/d/1Wu_ysaVm22G5PgOpTJbHy14DJ-uvr34U/view?usp=sharing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20221205/a786f84b/attachment.html>