Fwd: RI Ph.D. Thesis Proposal: Matt Barnes
Artur Dubrawski
awd at cs.cmu.edu
Tue Nov 21 11:56:29 EST 2017
Team,
Happy Thanksgiving!
+ please mark your calendars for Monday next week.
Attending Matt's proposal talk will surely help us burn the excess
calories from our turkey dinners.
Cheers,
Artur
-------- Forwarded Message --------
Subject: RI Ph.D. Thesis Proposal: Matt Barnes
Date: Mon, 20 Nov 2017 15:37:18 +0000
From: Suzanne Muth <lyonsmuth at cmu.edu>
To: ri-people at cs.cmu.edu <ri-people at cs.cmu.edu>
Date: 27 November 2017
Time: 4:00 p.m.
Place: Gates Hillman Center 4405
Type: Ph.D. Thesis Proposal
Who: Matt Barnes
Topic: Learning with Clusters
Abstract:
As machine learning becomes more ubiquitous, clustering has evolved from
primarily a data analysis tool into an integrated component of complex
machine learning systems, including those involving dimensionality
reduction, anomaly detection, network analysis, image segmentation and
classifying groups of data. With this integration into multi-stage
systems comes a need to better understand interactions between pipeline
components. Changing parameters of the clustering algorithm will impact
downstream components and, quite unfortunately, it is usually not
possible to simply back-propagate through the entire system. Currently,
as with many machine learning systems, the output of the clustering
algorithm is taken as ground truth at the next pipeline step. Our
empirical results show this false assumption may have dramatic empirical
impacts -- sometimes biasing results by upwards of 25%.
We address this gap by developing estimators and methods to both
quantify and correct for clustering errors' impacts on downstream
learners. Our work is agnostic to the downstream learners, and requires
few assumptions on the clustering algorithm. Theoretical and empirical
results demonstrate our methods and estimators are superior to the
current naive approaches, which do not account for clustering errors.
Along these lines, we also develop several new clustering algorithms and
prove theoretical bounds for existing algorithms, to be used as inputs
to our later error-correction methods. Not surprisingly, we find
learning on clusters of data is both theoretically and empirically
easier as the number of clustering errors decreases. Thus, our work is
two-fold. We attempt to both provide the best clustering possible and
learn on inevitably noisy clusters.
A major limiting factor in our error-correction methods is scalability.
Currently, their computational complexity is O(n^3) where n is the size
of the training dataset. This limits their applicability to very small
machine learning problems. We propose addressing this scalability issue
through approximation. It should be possible to reduce the computational
complexity to O(p^3) where p is a small fixed constant and independent
of n, corresponding to the number of parameters in the approximation.
Thesis Committee Members:
Artur Dubrawski, Chair
Geoff Gordon
Kris Kitani
Beka Steorts, Duke University
A copy of the thesis proposal document is available at:
http://goo.gl/MpwTCN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20171121/c6259658/attachment.html>
More information about the Autonlab-users
mailing list