From awd at cs.cmu.edu Mon Feb 11 10:43:36 2008 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 11 Feb 2008 10:43:36 -0500 Subject: [Research] Lab Meeting tomorrow (Tue Feb 12th) Message-ID: <47B06D28.3010703@cs.cmu.edu> Time: 11:30am, Place: NSH 1507, Food: Yes. Speaker: Kevin Hutchinson, co-Founder and CEO, Health Monitoring Systems, Inc. Title: From Biosurveillance to Community Health Surveillance Summary: Health Monitoring Systems, Inc. (HMS), is developing a set of tools to support community health surveillance systems and giving government and industry the tools they need to monitor trends and have early-warning alerts. The company offers an advanced online service, known as Epicenter?, for management of community health surveillance to public health departments and agencies, at all levels of government. Epicenter is an open source, task based web application, designed to emulate the experience of a desktop application. Epicenter also represents several innovations in the field of healthcare surveillance. Maps provided by Google, Inc., provide detailed information on multiple geographic scales, and allow intuitive user interaction with data. Users can find and examine anomalies, and create investigations to track multiple anomalies in concert. Symptom classifiers have been added to accurately classify chief complaints that fall under multiple syndrome categories. Additional analytical tools have been put in place to yield probability values for anomalies, indicating the likelihood of such an observation, to give users a more meaningful view of events. Finally, Epicenter is designed to allow the addition of new data types, analytics, and visualization tools as it is further developed to better meet the expanding needs of public health. From awd at cs.cmu.edu Tue Feb 12 09:57:20 2008 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 12 Feb 2008 09:57:20 -0500 Subject: [Research] LAB MEETING CANCELED Message-ID: <47B1B3D0.9030302@cs.cmu.edu> Unfortunately our speaker for today's meeting cannot get here on the account of bad weather. We'll reschedule his talk and I will keep you informed. Artur From awd at cs.cmu.edu Fri Feb 15 17:38:04 2008 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 15 Feb 2008 17:38:04 -0500 Subject: [Research] Lab Meeting tomorrow (Tue Feb 19th) In-Reply-To: <47B06D28.3010703@cs.cmu.edu> References: <47B06D28.3010703@cs.cmu.edu> Message-ID: <47B6144C.2020203@cs.cmu.edu> On Tuesday the 19th, we will make another attempt at hosting Kevin Hutchinson! Nothing changes, except for the date. Artur Artur Dubrawski wrote: > Time: 11:30am, Place: NSH 1507, Food: Yes. > > Speaker: > Kevin Hutchinson, co-Founder and CEO, Health Monitoring Systems, Inc. > > Title: > From Biosurveillance to Community Health Surveillance > > Summary: > Health Monitoring Systems, Inc. (HMS), is developing a set of tools to > support community health surveillance systems and giving government and > industry the tools they need to monitor trends and have early-warning > alerts. The company offers an advanced online service, known as > Epicenter?, for management of community health surveillance to public > health departments and agencies, at all levels of government. Epicenter > is an open source, task based web application, designed to emulate the > experience of a desktop application. > Epicenter also represents several innovations in the field of healthcare > surveillance. Maps provided by Google, Inc., provide detailed > information on multiple geographic scales, and allow intuitive user > interaction with data. Users can find and examine anomalies, and create > investigations to track multiple anomalies in concert. Symptom > classifiers have been added to accurately classify chief complaints that > fall under multiple syndrome categories. Additional analytical tools > have been put in place to yield probability values for anomalies, > indicating the likelihood of such an observation, to give users a more > meaningful view of events. Finally, Epicenter is designed to allow the > addition of new data types, analytics, and visualization tools as it is > further developed to better meet the expanding needs of public health. > _______________________________________________ > Research mailing list > Research at autonlab.org > https://www.autonlab.org/mailman/listinfo/research > > From awd at cs.cmu.edu Tue Feb 19 17:14:55 2008 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 19 Feb 2008 17:14:55 -0500 Subject: [Research] Next Auton Lab meeting: Tuesday Feb 26, the usual time and place + an exciting topic! Message-ID: <47BB54DF.4000508@cs.cmu.edu> Linear-Time Subset Scanning Daniel B. Neill (neill at cs.cmu.edu) The scan statistic is a commonly used and powerful framework for detecting anomalous patterns. The typical scan statistic approach is to define a fixed set of "search regions" (each search region represents a subset of the data, which may be of potential interest), find regions that maximize some likelihood ratio statistic, and compute the statistical significance (or alternatively, posterior probability) of each region. Since there are exponentially many subsets of the data, an exhaustive search over all such subsets is computationally infeasible. Thus a typical approach is to constrain the search regions (e.g. searching over circles or rectangles for spatial data), but any chosen set of search regions will have low power to detect patterns that do not correspond to these regions. However, many commonly used scan statistics (including the Kulldorff, expectation-based, and nonparametric statistics) have an intriguing property which we call "linear-time subset scanning" (LTSS): the group which maximizes the likelihood ratio statistic can be found by ordering the data records from most to least relevant, and only searching groups consisting of the top-k most interesting records, requiring linear rather than exponential time. If we have a uniform prior over all subsets, the group found by LTSS also maximizes the posterior probability. However, we often want to maximize the posterior under hard constraints (e.g. some regions have zero priors) and/or soft constraints (e.g. some regions have higher priors than others). In these cases, the unconstrained optimum can be used as an upper bound on the constrained optimum for branch-and-bound search, or as an informed starting point for greedy heuristic search. This work is still in the very early stages, so I'd like to lead an informal brainstorming session of how LTSS can be used to find either exact or approximate solutions to various constrained subset scan problems: 1. Network worm detection, searching over subnets of the network 2. Fast spatial scan over all distinct rectangular regions 3. Fast graph scan over all distinct connected subgraphs 4. What if we want to maximize the posterior over all subsets, with a non-uniform prior based on self-similarity? 5. What if we are given an arbitrary, but structured, list of "valid" subsets, and want to maximize over these? And many other questions are worth considering, including: * How can LTSS be applied to multivariate pattern detection, with multiple event types and multiple data streams? * What are the necessary and sufficient conditions for a scan statistic to have the LTSS property? * How is LTSS related to submodularity?