<div dir="ltr">Team,<div><br></div><div>Please come and see Sebastian give an excellent talk about his proposed thesis work.</div><div>It happens tomorrow at 10:30am, on zoom.<br><br>Cheers</div><div>Artur</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>From: <strong class="gmail_sendername" dir="auto">Diane Stidle</strong> <span dir="auto"><<a href="mailto:stidle@andrew.cmu.edu">stidle@andrew.cmu.edu</a>></span><br>Date: Mon, Dec 5, 2022 at 2:33 PM<br>Subject: Thesis Proposal - Dec. 16, 2022 - Sebastian Caldas - Collaborative learning by leveraging siloed data<br>To: <a href="mailto:ml-seminar@cs.cmu.edu">ml-seminar@cs.cmu.edu</a> <<a href="mailto:ML-SEMINAR@cs.cmu.edu">ML-SEMINAR@cs.cmu.edu</a>>,  <<a href="mailto:cler@pitt.edu">cler@pitt.edu</a>>,  <<a href="mailto:martin.jaggi@epfl.ch">martin.jaggi@epfl.ch</a>><br></div><br><br>
  

    
  
  <div>
    <p><i><b>Thesis Proposal</b></i></p>
    <p>Date: December 16, 2022<br>
      Time: 10:30am (EST)<br>
      Remote Only<br>
      Speaker: Sebastian Caldas</p>
    <p><b>Title: Collaborative learning by leveraging siloed data</b></p>
    <p>Abstract:<br>
      <span id="m_115094846670809040gmail-docs-internal-guid-62de273d-7fff-5e6e-d607-8e56488fd9f8">
        </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Data holders cannot always share the data that they own, which can ultimately limit the modeling capabilities of each holder. For example, a hospital may lack representative records to learn about a new or rare condition, or a single mobile device may not have enough input to train a useful language model about its user. In both of these cases, these siloed data holders would benefit from collaborating with others in order to leverage their data. </span></p>
        <br>
        <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">In recent years, the field of federated learning has taken an interest in learning performant collaborative models from siloed data. For these models to be truly useful, however, they must provide utility along dimensions beyond predictive performance, such as confidentiality, fairness and privacy. In this thesis proposal, I will demonstrate how to improve the utility of collaborative models that leverage siloed data, focusing on three dimensions of utility that are of current relevance to collaborative contexts:</span></p>
        <br>
        <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Explanations: We combine explanations with predictive performance in pursuit of true clinical utility. To this end, we introduced FRCLS, an algorithm that can explicitly identify when a prediction is using knowledge from an external collaborator, and provides interpretable rules that delineate subpopulations for which that external knowledge is useful. We have demonstrated the efficacy of FRCLS on a variety of clinical tasks including early prediction of sepsis and prediction of overly long lengths of stay.</span></p>
        <br>
        <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Expert supervision: We encode domain knowledge into on-device data, enabling collaborative learning for a wider variety of problems. We encode this knowledge by leveraging heuristics curated by experts. We first learn which heuristics will be useful for the devices’ data and then train a weakly supervised federated model using these heuristics.</span></p>
        <br>
        <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Communication constraints: To complete my dissertation, I propose to study settings where collaborators are limited in the number of rounds of communication that can be exchanged, as is seen in clinical settings with limited infrastructure. I propose to develop an adaptive knowledge distillation strategy and to demonstrate it in a healthcare application context.</span></p>
        <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">
</span></p>
        <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><b>Thesis
            Committee:</b><br>
          Artur Dubrawski (chair)<br>
          Virginia Smith<br>
          Gilles Clermont (University of Pittsburgh)<br>
          Martin Jaggi (EPFL) <br>
        </p>
      <p></p>
    <p><b>Zoom meeting link:</b> <a href="https://cmu.zoom.us/j/94957077221?pwd=SkdHZitvNkR2Zm9lSXMyUGtPUldjQT09" target="_blank">https://cmu.zoom.us/j/94957077221?pwd=SkdHZitvNkR2Zm9lSXMyUGtPUldjQT09</a></p>
    <p><b>Link to the draft document: </b><a href="https://drive.google.com/file/d/1Wu_ysaVm22G5PgOpTJbHy14DJ-uvr34U/view?usp=sharing" target="_blank">https://drive.google.com/file/d/1Wu_ysaVm22G5PgOpTJbHy14DJ-uvr34U/view?usp=sharing</a>
      <br>
    </p>
    <div><br>
      <br>
    </div>
    <p><span id="m_115094846670809040gmail-docs-internal-guid-62de273d-7fff-5e6e-d607-8e56488fd9f8">
        </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"></span></p>
      <p></p>
  </div>

</div></div></div>