[IR Discussion Series] Oct 2nd 2pm in Wean 7220.

Grace Hui Yang huiyang at cs.cmu.edu
Mon Sep 29 20:43:12 EDT 2008


Dear all,

    We are going to have Le Zhao to give our first IR talk in this 
semester. Reception will provided by Yahoo!.  Here is the talk information:

    Date: Thursday 2nd Oct 2008
    Time: 2pm
    Place: Wean Hall 7220

    Speaker: Le Zhao
    Title: A Generative Retrieval Model for Structured Documents

Abstract
Structured documents contain elements defined by the author(s) and
annotations assigned by other people or processes.   Structured documents
pose challenges for probabilistic retrieval models when there are
mismatches between the structured query and the actual structure in a
relevant document or erroneous structure introduced by an annotator. This
paper makes three contributions.  First, a new generative retrieval model
is proposed to deal with the mismatch problem.  This new model extends the
basic keyword language model by treating structure as hidden variable
during the generation process.  Second, variations of the model are
compared. Third, term-level and structure-level smoothing strategies are
studied.  Evaluation was conducted with INEX XML retrieval and
question-answering retrieval tasks.  Experimental results indicate that
the optimal structured retrieval model is task dependent, two-level
Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer
smoothing, and with accurate structured queries, the proposed structured
retrieval model outperforms keyword retrieval significantly, on both QA
and INEX datasets.

Based on work accepted at CIKM'08.


See you then!

Grace, Jaime, Jon



 


More information about the Ir-series mailing list