From awd at cs.cmu.edu Thu Jul 17 01:22:58 2008 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Thu, 17 Jul 2008 01:22:58 -0400 Subject: [Research] Auton Lab Meeting: Monday July 21 10am NSH 1507 In-Reply-To: <009e01c8e690$2d9d45e0$fadc0280@adm.ri.cmu.edu> References: <487CBEE0.8010707@cs.cmu.edu> <009e01c8e690$2d9d45e0$fadc0280@adm.ri.cmu.edu> Message-ID: <487ED732.7050002@cs.cmu.edu> Dear Autonians, We will have a guest speaker, Daria Sorokina from Cornell, who will give the following, highly relevant talk. Time and place are given in the subject of this message, coffee/tea and cookies will be provided on the spot. Looking forward to seeing you there. Artur ------- Title: Modeling Additive Structure and Detecting Interactions with Groves of Trees Abstract: A lot of research in machine learning and data mining is concentrated on building prediction models with the best possible performance. In most cases such models act as black boxes: they make good predictions, but do not provide much insight into the decision making process. This is unsatisfactory for domain scientists who also want to answer questions like: What effects do important features have on the response variable? Which features are involved in complex effects and should be studied only together with some other features? How can we visualize and interpret such complex effects? Separate post-processing techniques are needed to answer these questions. The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers. In this talk I will describe an approach to interaction detection based on comparing the performance of unrestricted and restricted prediction models, where restricted models are prevented from modeling an interaction in question. I will present a new algorithm, Additive Groves, that has the right properties for this framework. Additive Groves is an ensemble of additive regression trees, based on such techniques as bagging and additive models; their combination allows us to use large trees in the ensemble and at the same time model additive structure of the response function. I will demonstrate results of interaction detection analysis on real data describing the abundance of different species of birds in the prairies to the east of the southern Rocky Mountains. In the second part I will talk more about a regression ensemble Additive Groves and its classification counterpart, Gradient Groves. I will show that these algorithms yield consistently high performance across a variety of problems, outperforming on average a large number of other algorithms. This is joint work with Rich Caruana and Mirek Riedewald.