Fwd: RI Ph.D. Thesis Proposal: Matt Barnes
Artur Dubrawski
awd at cs.cmu.edu
Mon Nov 27 10:36:52 EST 2017
It is today, 4pm, Gates 4405!
A.
On 11/21/2017 11:56 AM, Artur Dubrawski wrote:
> Team,
>
> Happy Thanksgiving!
>
> + please mark your calendars for Monday next week.
> Attending Matt's proposal talk will surely help us burn the excess
> calories from our turkey dinners.
>
> Cheers,
> Artur
>
>
> -------- Forwarded Message --------
> Subject: RI Ph.D. Thesis Proposal: Matt Barnes
> Date: Mon, 20 Nov 2017 15:37:18 +0000
> From: Suzanne Muth <lyonsmuth at cmu.edu>
> To: ri-people at cs.cmu.edu <ri-people at cs.cmu.edu>
>
>
>
> Date: 27 November 2017
>
> Time: 4:00 p.m.
>
> Place: Gates Hillman Center 4405
>
> Type: Ph.D. Thesis Proposal
>
> Who: Matt Barnes
>
> Topic: Learning with Clusters
>
>
> Abstract:
>
> As machine learning becomes more ubiquitous, clustering has evolved
> from primarily a data analysis tool into an integrated component of
> complex machine learning systems, including those involving
> dimensionality reduction, anomaly detection, network analysis, image
> segmentation and classifying groups of data. With this integration
> into multi-stage systems comes a need to better understand
> interactions between pipeline components. Changing parameters of the
> clustering algorithm will impact downstream components and, quite
> unfortunately, it is usually not possible to simply back-propagate
> through the entire system. Currently, as with many machine learning
> systems, the output of the clustering algorithm is taken as ground
> truth at the next pipeline step. Our empirical results show this false
> assumption may have dramatic empirical impacts -- sometimes biasing
> results by upwards of 25%.
>
> We address this gap by developing estimators and methods to both
> quantify and correct for clustering errors' impacts on downstream
> learners. Our work is agnostic to the downstream learners, and
> requires few assumptions on the clustering algorithm. Theoretical and
> empirical results demonstrate our methods and estimators are superior
> to the current naive approaches, which do not account for clustering
> errors.
>
> Along these lines, we also develop several new clustering algorithms
> and prove theoretical bounds for existing algorithms, to be used as
> inputs to our later error-correction methods. Not surprisingly, we
> find learning on clusters of data is both theoretically and
> empirically easier as the number of clustering errors decreases. Thus,
> our work is two-fold. We attempt to both provide the best clustering
> possible and learn on inevitably noisy clusters.
>
> A major limiting factor in our error-correction methods is
> scalability. Currently, their computational complexity is O(n^3) where
> n is the size of the training dataset. This limits their applicability
> to very small machine learning problems. We propose addressing this
> scalability issue through approximation. It should be possible to
> reduce the computational complexity to O(p^3) where p is a small fixed
> constant and independent of n, corresponding to the number of
> parameters in the approximation.
>
>
>
> Thesis Committee Members:
>
> Artur Dubrawski, Chair
>
> Geoff Gordon
>
> Kris Kitani
>
> Beka Steorts, Duke University
>
>
>
> A copy of the thesis proposal document is available at:
>
> http://goo.gl/MpwTCN
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20171127/1471bf3a/attachment.html>
More information about the Autonlab-users
mailing list