Fwd: RI Ph.D. Thesis Proposal: Matt Barnes

Tue Nov 21 11:56:29 EST 2017

Team,

Happy Thanksgiving!

+ please mark your calendars for Monday next week.
Attending Matt's proposal talk will surely help us burn the excess 
calories from our turkey dinners.

Cheers,
Artur

-------- Forwarded Message --------
Subject: 	RI Ph.D. Thesis Proposal: Matt Barnes
Date: 	Mon, 20 Nov 2017 15:37:18 +0000
From: 	Suzanne Muth <lyonsmuth at cmu.edu>
To: 	ri-people at cs.cmu.edu <ri-people at cs.cmu.edu>

Date:   27 November 2017

Time:   4:00 p.m.

Place:  Gates Hillman Center 4405

Type:   Ph.D. Thesis Proposal

Who:   Matt Barnes

Topic:  Learning with Clusters

Abstract:

As machine learning becomes more ubiquitous, clustering has evolved from 
primarily a data analysis tool into an integrated component of complex 
machine learning systems, including those involving dimensionality 
reduction, anomaly detection, network analysis, image segmentation and 
classifying groups of data. With this integration into multi-stage 
systems comes a need to better understand interactions between pipeline 
components. Changing parameters of the clustering algorithm will impact 
downstream components and, quite unfortunately, it is usually not 
possible to simply back-propagate through the entire system. Currently, 
as with many machine learning systems, the output of the clustering 
algorithm is taken as ground truth at the next pipeline step. Our 
empirical results show this false assumption may have dramatic empirical 
impacts -- sometimes biasing results by upwards of 25%.

We address this gap by developing estimators and methods to both 
quantify and correct for clustering errors' impacts on downstream 
learners. Our work is agnostic to the downstream learners, and requires 
few assumptions on the clustering algorithm. Theoretical and empirical 
results demonstrate our methods and estimators are superior to the 
current naive approaches, which do not account for clustering errors.

Along these lines, we also develop several new clustering algorithms and 
prove theoretical bounds for existing algorithms, to be used as inputs 
to our later error-correction methods. Not surprisingly, we find 
learning on clusters of data is both theoretically and empirically 
easier as the number of clustering errors decreases. Thus, our work is 
two-fold. We attempt to both provide the best clustering possible and 
learn on inevitably noisy clusters.

A major limiting factor in our error-correction methods is scalability. 
Currently, their computational complexity is O(n^3) where n is the size 
of the training dataset. This limits their applicability to very small 
machine learning problems. We propose addressing this scalability issue 
through approximation. It should be possible to reduce the computational 
complexity to O(p^3) where p is a small fixed constant and independent 
of n, corresponding to the number of parameters in the approximation.

Thesis Committee Members:

Artur Dubrawski, Chair

Geoff Gordon

Kris Kitani

Beka Steorts, Duke University

A copy of the thesis proposal document is available at:

http://goo.gl/MpwTCN

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20171121/c6259658/attachment.html>