[IR Discussion Series] Oct 2nd 2pm in Wean 7220.

Grace Hui Yang huiyang at cs.cmu.edu
Mon Sep 29 20:44:07 EDT 2008


Dear all,
>
>    We are going to have Le Zhao to give our first IR talk in this 
> semester. Reception will provided by Yahoo!.  Here is the talk 
> information:
>
>    Date: Thursday 2nd Oct 2008
>    Time: 2pm
>    Place: Wean Hall 7220
>
>    Speaker: Le Zhao
>    Title: A Generative Retrieval Model for Structured Documents
>
> Abstract
> Structured documents contain elements defined by the author(s) and
> annotations assigned by other people or processes.   Structured documents
> pose challenges for probabilistic retrieval models when there are
> mismatches between the structured query and the actual structure in a
> relevant document or erroneous structure introduced by an annotator. This
> paper makes three contributions.  First, a new generative retrieval model
> is proposed to deal with the mismatch problem.  This new model extends 
> the
> basic keyword language model by treating structure as hidden variable
> during the generation process.  Second, variations of the model are
> compared. Third, term-level and structure-level smoothing strategies are
> studied.  Evaluation was conducted with INEX XML retrieval and
> question-answering retrieval tasks.  Experimental results indicate that
> the optimal structured retrieval model is task dependent, two-level
> Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer
> smoothing, and with accurate structured queries, the proposed structured
> retrieval model outperforms keyword retrieval significantly, on both QA
> and INEX datasets.
>
> Based on work accepted at CIKM'08.
>
>
> See you then!
>
> Grace, Jaime, Jon
>
>
>
>
>



More information about the Ir-series mailing list