From sabhnani+ at cs.cmu.edu Fri Aug 27 10:15:02 2010 From: sabhnani+ at cs.cmu.edu (Robin Sabhnani) Date: Fri, 27 Aug 2010 10:15:02 -0400 Subject: [Research] proposal talk. Message-ID: <4C77C866.6010807@cs.cmu.edu> Hi all, I am giving my thesis proposal talk this afternoon. You are welcome to attend it. See announcement below. #################### Date: 8/27/10 Time: 3:00pm Place: 4405 GHC PhD Candidate: Maheshkumar (Robin) Sabhnani Title: Disjunctive Anomaly Detection: Identifying Complex Anomalous Patterns Abstract: The problem of anomaly detection in multivariate time series data is common to many applications of practical interest. A few examples include network intrusion detection systems, manufacturing processes, climate studies, syndromic surveillance, video stream processing, etc. Our motivating application is syndromic surveillance that aims to detect potential disease outbreaks in pre-diagnosis data to facilitate timely public health response. To achieve this goal, efficient data structures and smart algorithms are needed to analyze highly multivariate temporal data. In this thesis work, we introduce Disjunctive Anomaly Detection (DAD),an algorithm for detecting complex anomalous clusters in multivariate datasets with categorical dimensions. Our proposed algorithm assumes that an anomalous cluster can affect any subset data dimensions (using conjunctions) and any subset of values (using disjunctions) along each data dimension. We believe that such a cluster definition is more informative of the real outbreaks as compared to the current approaches. In addition, the DAD algorithm models multiple anomalous clusters simultaneously, hence promising better detection power in the presence of multiple overlapping anomalous events. So far, we have compared DAD algorithm against the relevant powerful alternatives on two important tasks: finding sample-variable associations in cancer microarray data, and searching for the emerging disease outbreaks in public health data. Experimental results indicate that DAD is able to detect and explain complex anomalous clusters better than the alternative approaches such as the Large Average Submatrix (LAS) algorithm and the What's Strange About Recent Events (WSARE) algorithm. To assist in the development of future complex multidimensional and multivariate algorithms (including extensions to DAD),we also introduce the T-Cube data structure that efficiently represents any time series data with multiple categorical dimensions (typical in many fields of application including surveillance). The T-Cube data structure (inspired from AD-Trees for categorical count data) acts as a cache and quickly responds to any ad-hoc queries during an investigation. It enables processing of millions of time series during massive data mining operations.We have successfully applied T-Cube to mine interesting patterns in diverse projects involving temporal event data. Thesis Committee: Artur Dubrawski (Co-chair) Jeff Schneider (Co-chair) Aarti Singh Greg Cooper (University of Pittsburgh) From komarek.paul at gmail.com Fri Aug 27 11:48:46 2010 From: komarek.paul at gmail.com (Paul Komarek) Date: Fri, 27 Aug 2010 08:48:46 -0700 Subject: [Research] proposal talk. In-Reply-To: <4C77C866.6010807@cs.cmu.edu> References: <4C77C866.6010807@cs.cmu.edu> Message-ID: good luck Robin! On Fri, Aug 27, 2010 at 7:15 AM, Robin Sabhnani wrote: > Hi all, > > I am giving my thesis proposal talk this afternoon. You are welcome to > attend it. See announcement below. > > #################### > > Date: 8/27/10 > Time: 3:00pm > Place: 4405 GHC > > PhD Candidate: Maheshkumar (Robin) Sabhnani > > Title: Disjunctive Anomaly Detection: Identifying Complex Anomalous > Patterns > > Abstract: > > The problem of anomaly detection in multivariate time series data is > common to many applications of practical interest. A few examples > include network intrusion detection systems, manufacturing processes, > climate studies, syndromic surveillance, video stream processing, etc. > Our motivating application is syndromic surveillance that aims to detect > potential disease outbreaks in pre-diagnosis data to facilitate timely > public health response. To achieve this goal, efficient data structures > and smart algorithms are needed to analyze highly multivariate temporal > data. > > In this thesis work, we introduce Disjunctive Anomaly Detection (DAD),an > algorithm for detecting complex anomalous clusters in multivariate > datasets with categorical dimensions. Our proposed algorithm assumes > that an anomalous cluster can affect any subset data dimensions (using > conjunctions) and any subset of values (using disjunctions) along each > data dimension. We believe that such a cluster definition is more > informative of the real outbreaks as compared to the current approaches. > In addition, the DAD algorithm models multiple anomalous clusters > simultaneously, hence promising better detection power in the presence > of multiple overlapping anomalous events. So far, we have compared DAD > algorithm against the relevant powerful alternatives on two important > tasks: finding sample-variable associations in cancer microarray data, > and searching for the emerging disease outbreaks in public health data. > Experimental results indicate that DAD is able to detect and explain > complex anomalous clusters better than the alternative approaches such > as the Large Average Submatrix (LAS) algorithm and the What's Strange > About Recent Events (WSARE) algorithm. > > To assist in the development of future complex multidimensional and > multivariate algorithms (including extensions to DAD),we also introduce > the T-Cube data structure that efficiently represents any time series > data with multiple categorical dimensions (typical in many fields of > application including surveillance). The T-Cube data structure (inspired > from AD-Trees for categorical count data) acts as a cache and quickly > responds to any ad-hoc queries during an investigation. It enables > processing of millions of time series during massive data mining > operations.We have successfully applied T-Cube to mine interesting > patterns in diverse projects involving temporal event data. > > Thesis Committee: > Artur Dubrawski (Co-chair) > Jeff Schneider (Co-chair) > Aarti Singh > Greg Cooper (University of Pittsburgh) > _______________________________________________ > Research mailing list > Research at autonlab.org > https://www.autonlab.org/mailman/listinfo/research > From awd at cs.cmu.edu Fri Aug 27 11:58:25 2010 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 27 Aug 2010 11:58:25 -0400 Subject: [Research] [auton-users] proposal talk. In-Reply-To: References: <4C77C866.6010807@cs.cmu.edu> <4C77DE95.9020009@cs.cmu.edu> Message-ID: <4C77E0A1.5060208@cs.cmu.edu> Figures :) It is good to virtually see you though! On 8/27/2010 11:50 AM, Paul Komarek wrote: > I have a nail appointment that day. > > On Fri, Aug 27, 2010 at 8:49 AM, Artur Dubrawski wrote: >> You're not coming Paul??? >> >> >> On 8/27/2010 11:48 AM, Paul Komarek wrote: >>> >>> good luck Robin! >>> >>> On Fri, Aug 27, 2010 at 7:15 AM, Robin Sabhnani >>> wrote: >>>> >>>> Hi all, >>>> >>>> I am giving my thesis proposal talk this afternoon. You are welcome to >>>> attend it. See announcement below. >>>> >>>> #################### >>>> >>>> Date: 8/27/10 >>>> Time: 3:00pm >>>> Place: 4405 GHC >>>> >>>> PhD Candidate: Maheshkumar (Robin) Sabhnani >>>> >>>> Title: Disjunctive Anomaly Detection: Identifying Complex Anomalous >>>> Patterns >>>> >>>> Abstract: >>>> >>>> The problem of anomaly detection in multivariate time series data is >>>> common to many applications of practical interest. A few examples >>>> include network intrusion detection systems, manufacturing processes, >>>> climate studies, syndromic surveillance, video stream processing, etc. >>>> Our motivating application is syndromic surveillance that aims to detect >>>> potential disease outbreaks in pre-diagnosis data to facilitate timely >>>> public health response. To achieve this goal, efficient data structures >>>> and smart algorithms are needed to analyze highly multivariate temporal >>>> data. >>>> >>>> In this thesis work, we introduce Disjunctive Anomaly Detection (DAD),an >>>> algorithm for detecting complex anomalous clusters in multivariate >>>> datasets with categorical dimensions. Our proposed algorithm assumes >>>> that an anomalous cluster can affect any subset data dimensions (using >>>> conjunctions) and any subset of values (using disjunctions) along each >>>> data dimension. We believe that such a cluster definition is more >>>> informative of the real outbreaks as compared to the current approaches. >>>> In addition, the DAD algorithm models multiple anomalous clusters >>>> simultaneously, hence promising better detection power in the presence >>>> of multiple overlapping anomalous events. So far, we have compared DAD >>>> algorithm against the relevant powerful alternatives on two important >>>> tasks: finding sample-variable associations in cancer microarray data, >>>> and searching for the emerging disease outbreaks in public health data. >>>> Experimental results indicate that DAD is able to detect and explain >>>> complex anomalous clusters better than the alternative approaches such >>>> as the Large Average Submatrix (LAS) algorithm and the What's Strange >>>> About Recent Events (WSARE) algorithm. >>>> >>>> To assist in the development of future complex multidimensional and >>>> multivariate algorithms (including extensions to DAD),we also introduce >>>> the T-Cube data structure that efficiently represents any time series >>>> data with multiple categorical dimensions (typical in many fields of >>>> application including surveillance). The T-Cube data structure (inspired >>>> from AD-Trees for categorical count data) acts as a cache and quickly >>>> responds to any ad-hoc queries during an investigation. It enables >>>> processing of millions of time series during massive data mining >>>> operations.We have successfully applied T-Cube to mine interesting >>>> patterns in diverse projects involving temporal event data. >>>> >>>> Thesis Committee: >>>> Artur Dubrawski (Co-chair) >>>> Jeff Schneider (Co-chair) >>>> Aarti Singh >>>> Greg Cooper (University of Pittsburgh) >>>> _______________________________________________ >>>> Research mailing list >>>> Research at autonlab.org >>>> https://www.autonlab.org/mailman/listinfo/research >>>> >>> >> >