From huiyang at cs.cmu.edu Mon Sep 29 20:43:12 2008 From: huiyang at cs.cmu.edu (Grace Hui Yang) Date: Mon, 29 Sep 2008 20:43:12 -0400 Subject: [IR Discussion Series] Oct 2nd 2pm in Wean 7220. Message-ID: <48E17620.5080701@cs.cmu.edu> Dear all, We are going to have Le Zhao to give our first IR talk in this semester. Reception will provided by Yahoo!. Here is the talk information: Date: Thursday 2nd Oct 2008 Time: 2pm Place: Wean Hall 7220 Speaker: Le Zhao Title: A Generative Retrieval Model for Structured Documents Abstract Structured documents contain elements defined by the author(s) and annotations assigned by other people or processes. Structured documents pose challenges for probabilistic retrieval models when there are mismatches between the structured query and the actual structure in a relevant document or erroneous structure introduced by an annotator. This paper makes three contributions. First, a new generative retrieval model is proposed to deal with the mismatch problem. This new model extends the basic keyword language model by treating structure as hidden variable during the generation process. Second, variations of the model are compared. Third, term-level and structure-level smoothing strategies are studied. Evaluation was conducted with INEX XML retrieval and question-answering retrieval tasks. Experimental results indicate that the optimal structured retrieval model is task dependent, two-level Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer smoothing, and with accurate structured queries, the proposed structured retrieval model outperforms keyword retrieval significantly, on both QA and INEX datasets. Based on work accepted at CIKM'08. See you then! Grace, Jaime, Jon From huiyang at cs.cmu.edu Mon Sep 29 20:44:07 2008 From: huiyang at cs.cmu.edu (Grace Hui Yang) Date: Mon, 29 Sep 2008 20:44:07 -0400 Subject: [IR Discussion Series] Oct 2nd 2pm in Wean 7220. In-Reply-To: <48E17620.5080701@cs.cmu.edu> References: <48E17620.5080701@cs.cmu.edu> Message-ID: <48E17657.6060002@cs.cmu.edu> Dear all, > > We are going to have Le Zhao to give our first IR talk in this > semester. Reception will provided by Yahoo!. Here is the talk > information: > > Date: Thursday 2nd Oct 2008 > Time: 2pm > Place: Wean Hall 7220 > > Speaker: Le Zhao > Title: A Generative Retrieval Model for Structured Documents > > Abstract > Structured documents contain elements defined by the author(s) and > annotations assigned by other people or processes. Structured documents > pose challenges for probabilistic retrieval models when there are > mismatches between the structured query and the actual structure in a > relevant document or erroneous structure introduced by an annotator. This > paper makes three contributions. First, a new generative retrieval model > is proposed to deal with the mismatch problem. This new model extends > the > basic keyword language model by treating structure as hidden variable > during the generation process. Second, variations of the model are > compared. Third, term-level and structure-level smoothing strategies are > studied. Evaluation was conducted with INEX XML retrieval and > question-answering retrieval tasks. Experimental results indicate that > the optimal structured retrieval model is task dependent, two-level > Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer > smoothing, and with accurate structured queries, the proposed structured > retrieval model outperforms keyword retrieval significantly, on both QA > and INEX datasets. > > Based on work accepted at CIKM'08. > > > See you then! > > Grace, Jaime, Jon > > > > >