Reminder : [IR Discussion Series] Today 2pm in Wean 7220.
Grace Hui Yang
huiyang at cs.cmu.edu
Thu Oct 2 11:59:29 EDT 2008
Dear all,
>>
>> We are going to have Le Zhao to give our first IR talk in this
>> semester. Reception will provided by Yahoo!. Here is the talk
>> information:
>>
>> Date: Thursday 2nd Oct 2008
>> Time: 2pm
>> Place: Wean Hall 7220
>>
>> Speaker: Le Zhao
>> Title: A Generative Retrieval Model for Structured Documents
>>
>> Abstract
>> Structured documents contain elements defined by the author(s) and
>> annotations assigned by other people or processes. Structured
>> documents
>> pose challenges for probabilistic retrieval models when there are
>> mismatches between the structured query and the actual structure in a
>> relevant document or erroneous structure introduced by an annotator.
>> This
>> paper makes three contributions. First, a new generative retrieval
>> model
>> is proposed to deal with the mismatch problem. This new model
>> extends the
>> basic keyword language model by treating structure as hidden variable
>> during the generation process. Second, variations of the model are
>> compared. Third, term-level and structure-level smoothing strategies are
>> studied. Evaluation was conducted with INEX XML retrieval and
>> question-answering retrieval tasks. Experimental results indicate that
>> the optimal structured retrieval model is task dependent, two-level
>> Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer
>> smoothing, and with accurate structured queries, the proposed structured
>> retrieval model outperforms keyword retrieval significantly, on both QA
>> and INEX datasets.
>>
>> Based on work accepted at CIKM'08.
>>
>>
>> See you then!
>>
>> Grace, Jaime, Jon
>>
>>
>>
>>
>>
>
>
More information about the Ir-series
mailing list