Fwd: RI Ph.D. Thesis Defense: Matt Barnes

Mon Dec 3 16:54:04 EST 2018

Team,

Do not miss this joyful event!

Artur

---------- Forwarded message ---------
From: Suzanne Muth <lyonsmuth at cmu.edu>
Date: Mon, Dec 3, 2018 at 1:42 PM
Subject: RI Ph.D. Thesis Defense: Matt Barnes
To: ri-people at cs.cmu.edu <ri-people at cs.cmu.edu>

Date: 10 December 2018
Time: 9:00 a.m.
Place: GHC 8102
Type: Ph.D. Thesis Defense
Who: Matt Barnes
Topic: Learning with Clusters

Abstract:
Clustering, the problem of grouping similar data, has been extensively
studied since at least the 1950's. As machine learning becomes more
prominent, clustering has evolved from primarily a data analysis tool into
an integrated component of complex robotic and machine learning systems,
including those involving dimensionality reduction, anomaly detection,
network analysis, image segmentation and classifying groups of data.

With this integration into multi-stage systems comes a need to better
understand interactions between pipeline components. Changing parameters of
the clustering algorithm will impact downstream components and, quite
unfortunately, it is usually not possible to simply backpropagate through
the entire system.  Instead, it is common practice to take the output of
the clustering algorithm as ground truth at the next module of the
pipeline. We show this false assumption causes subtle and dangerous
behavior for even the simplest systems -- empirically biasing results by
upwards of 25%.

We address this gap by developing scalable estimators and methods to both
quantify and compensate the impact of clustering errors on downstream
learners. Our work is agnostic to the choice of other components of the
machine learning systems, and requires few assumptions on the clustering
algorithm. Theoretical and empirical results demonstrate our methods and
estimators are superior to the current naive approaches, which do not
account for clustering errors.

We also develop several new clustering algorithms and prove theoretical
bounds for existing algorithms, to be used as inputs to our
error-correction methods. Not surprisingly, we find that learning on
clusters of data is both theoretically and empirically easier as the number
of clustering errors decreases. Thus, our work is two-fold. We attempt to
provide the best clustering possible as well as establish how to
effectively learn on inevitably noisy clusters.

Thesis Committee Members:
Artur Dubrawski, Chair

Geoff Gordon

Kris Kitani

Beka Steorts, Duke University

A copy of the thesis document is available at:

*goo.gl/cNPSfY <http://goo.gl/cNPSfY>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181203/42650073/attachment.html>