[Research] proposal talk.

Robin Sabhnani sabhnani+ at cs.cmu.edu
Fri Aug 27 10:15:02 EDT 2010


Hi all,

I am giving my thesis proposal talk this afternoon. You are welcome to 
attend it. See announcement below.

####################

Date: 8/27/10
Time: 3:00pm
Place: 4405 GHC

PhD Candidate: Maheshkumar (Robin) Sabhnani

Title: Disjunctive Anomaly Detection: Identifying Complex Anomalous 
Patterns

Abstract:

The problem of anomaly detection in multivariate time series data is 
common to many applications of practical interest. A few examples 
include network intrusion detection systems, manufacturing processes, 
climate studies, syndromic surveillance, video stream processing, etc. 
Our motivating application is syndromic surveillance that aims to detect 
potential disease outbreaks in pre-diagnosis data to facilitate timely 
public health response. To achieve this goal, efficient data structures 
and smart algorithms are needed to analyze highly multivariate temporal 
data.

In this thesis work, we introduce Disjunctive Anomaly Detection (DAD),an 
algorithm for detecting complex anomalous clusters in multivariate 
datasets with categorical dimensions. Our proposed algorithm assumes 
that an anomalous cluster can affect any subset data dimensions (using 
conjunctions) and any subset of values (using disjunctions) along each 
data dimension. We believe that such a cluster definition is more 
informative of the real outbreaks as compared to the current approaches. 
In addition, the DAD algorithm models multiple anomalous clusters 
simultaneously, hence promising better detection power in the presence 
of multiple overlapping anomalous events. So far, we have compared DAD 
algorithm against the relevant powerful alternatives on two important 
tasks: finding sample-variable associations in cancer microarray data, 
and searching for the emerging disease outbreaks in public health data.
Experimental results indicate that DAD is able to detect and explain 
complex anomalous clusters better than the alternative approaches such 
as the Large Average Submatrix (LAS) algorithm and the What's Strange 
About Recent Events (WSARE) algorithm.

To assist in the development of future complex multidimensional and 
multivariate algorithms (including extensions to DAD),we also introduce 
the T-Cube data structure that efficiently represents any time series 
data with multiple categorical dimensions (typical in many fields of 
application including surveillance). The T-Cube data structure (inspired 
from AD-Trees for categorical count data) acts as a cache and quickly 
responds to any ad-hoc queries during an investigation. It enables
processing of millions of time series during massive data mining 
operations.We have successfully applied T-Cube to mine interesting 
patterns in diverse projects involving temporal event data.

Thesis Committee:
Artur Dubrawski (Co-chair)
Jeff Schneider (Co-chair)
Aarti Singh
Greg Cooper (University of Pittsburgh)



More information about the Autonlab-research mailing list