[Research] proposal talk.
Robin Sabhnani
sabhnani+ at cs.cmu.edu
Fri Aug 27 10:15:02 EDT 2010
Hi all,
I am giving my thesis proposal talk this afternoon. You are welcome to
attend it. See announcement below.
####################
Date: 8/27/10
Time: 3:00pm
Place: 4405 GHC
PhD Candidate: Maheshkumar (Robin) Sabhnani
Title: Disjunctive Anomaly Detection: Identifying Complex Anomalous
Patterns
Abstract:
The problem of anomaly detection in multivariate time series data is
common to many applications of practical interest. A few examples
include network intrusion detection systems, manufacturing processes,
climate studies, syndromic surveillance, video stream processing, etc.
Our motivating application is syndromic surveillance that aims to detect
potential disease outbreaks in pre-diagnosis data to facilitate timely
public health response. To achieve this goal, efficient data structures
and smart algorithms are needed to analyze highly multivariate temporal
data.
In this thesis work, we introduce Disjunctive Anomaly Detection (DAD),an
algorithm for detecting complex anomalous clusters in multivariate
datasets with categorical dimensions. Our proposed algorithm assumes
that an anomalous cluster can affect any subset data dimensions (using
conjunctions) and any subset of values (using disjunctions) along each
data dimension. We believe that such a cluster definition is more
informative of the real outbreaks as compared to the current approaches.
In addition, the DAD algorithm models multiple anomalous clusters
simultaneously, hence promising better detection power in the presence
of multiple overlapping anomalous events. So far, we have compared DAD
algorithm against the relevant powerful alternatives on two important
tasks: finding sample-variable associations in cancer microarray data,
and searching for the emerging disease outbreaks in public health data.
Experimental results indicate that DAD is able to detect and explain
complex anomalous clusters better than the alternative approaches such
as the Large Average Submatrix (LAS) algorithm and the What's Strange
About Recent Events (WSARE) algorithm.
To assist in the development of future complex multidimensional and
multivariate algorithms (including extensions to DAD),we also introduce
the T-Cube data structure that efficiently represents any time series
data with multiple categorical dimensions (typical in many fields of
application including surveillance). The T-Cube data structure (inspired
from AD-Trees for categorical count data) acts as a cache and quickly
responds to any ad-hoc queries during an investigation. It enables
processing of millions of time series during massive data mining
operations.We have successfully applied T-Cube to mine interesting
patterns in diverse projects involving temporal event data.
Thesis Committee:
Artur Dubrawski (Co-chair)
Jeff Schneider (Co-chair)
Aarti Singh
Greg Cooper (University of Pittsburgh)
More information about the Autonlab-research
mailing list