Fwd: RI Ph.D. Thesis Proposal: Matt Barnes

Mon Nov 27 10:36:52 EST 2017

It is today, 4pm, Gates 4405!

A.

On 11/21/2017 11:56 AM, Artur Dubrawski wrote:
> Team,
>
> Happy Thanksgiving!
>
> + please mark your calendars for Monday next week.
> Attending Matt's proposal talk will surely help us burn the excess 
> calories from our turkey dinners.
>
> Cheers,
> Artur
>
>
> -------- Forwarded Message --------
> Subject: 	RI Ph.D. Thesis Proposal: Matt Barnes
> Date: 	Mon, 20 Nov 2017 15:37:18 +0000
> From: 	Suzanne Muth <lyonsmuth at cmu.edu>
> To: 	ri-people at cs.cmu.edu <ri-people at cs.cmu.edu>
>
>
>
> Date:   27 November 2017
>
> Time: 4:00 p.m.
>
> Place:  Gates Hillman Center 4405
>
> Type:   Ph.D. Thesis Proposal
>
> Who:   Matt Barnes
>
> Topic:  Learning with Clusters
>
>
> Abstract:
>
> As machine learning becomes more ubiquitous, clustering has evolved 
> from primarily a data analysis tool into an integrated component of 
> complex machine learning systems, including those involving 
> dimensionality reduction, anomaly detection, network analysis, image 
> segmentation and classifying groups of data. With this integration 
> into multi-stage systems comes a need to better understand 
> interactions between pipeline components. Changing parameters of the 
> clustering algorithm will impact downstream components and, quite 
> unfortunately, it is usually not possible to simply back-propagate 
> through the entire system. Currently, as with many machine learning 
> systems, the output of the clustering algorithm is taken as ground 
> truth at the next pipeline step. Our empirical results show this false 
> assumption may have dramatic empirical impacts -- sometimes biasing 
> results by upwards of 25%.
>
> We address this gap by developing estimators and methods to both 
> quantify and correct for clustering errors' impacts on downstream 
> learners. Our work is agnostic to the downstream learners, and 
> requires few assumptions on the clustering algorithm. Theoretical and 
> empirical results demonstrate our methods and estimators are superior 
> to the current naive approaches, which do not account for clustering 
> errors.
> 
> Along these lines, we also develop several new clustering algorithms 
> and prove theoretical bounds for existing algorithms, to be used as 
> inputs to our later error-correction methods. Not surprisingly, we 
> find learning on clusters of data is both theoretically and 
> empirically easier as the number of clustering errors decreases. Thus, 
> our work is two-fold. We attempt to both provide the best clustering 
> possible and learn on inevitably noisy clusters.
>
> A major limiting factor in our error-correction methods is 
> scalability. Currently, their computational complexity is O(n^3) where 
> n is the size of the training dataset. This limits their applicability 
> to very small machine learning problems. We propose addressing this 
> scalability issue through approximation. It should be possible to 
> reduce the computational complexity to O(p^3) where p is a small fixed 
> constant and independent of n, corresponding to the number of 
> parameters in the approximation.
>
>
>
> Thesis Committee Members:
>
> Artur Dubrawski, Chair
>
> Geoff Gordon
>
> Kris Kitani
>
> Beka Steorts, Duke University
>
>
>
> A copy of the thesis proposal document is available at:
>
> http://goo.gl/MpwTCN
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20171127/1471bf3a/attachment.html>