[IR Series] - TREC 2007 talks - Friday, Nov 2 - 12:00 (noon), NSH 3002

Grace Hui Yang huiyang at cs.cmu.edu
Wed Oct 31 13:52:54 EDT 2007


Hi,
   There will be two talks this year's TREC given in this week's IR series.

   Lunch will be provided by Yahoo!
 
   Time:  Friday, Nov 2, 12:00-1:00pm
   Location:  NSH 3002
------------------------------------------------------------------------------------  

   First Talk:  (12:00-12:30)

   Speaker: Jonathan Elsas
   Title: CMU at the TREC 07 Blog Track: Retrieval and Feedback Models 
for Blog Distillation

   Abstract:

Feed distillation (or ``feed search") is the task of finding blog feeds 
with a principle, recurring interest in X, where X is some information 
need     expressed as a query. Thus, the input to the system is a query 
and the output is ranked list of blog feeds. Tailoring a system for feed 
search requires making several design decisions. In this work, we 
explored the following: (1) Is it most effective to treat this task as 
feed retrieval, viewing each feed as a single document; or entry 
retrieval, where ranked entries are aggregated into an overall feed 
ranking? (2) How can query expansion be appropriately performed for this 
task? Two different approaches are compared. The first one is based on 
pseudo-relevance feedback using the target collection. The second is a 
simple novel technique that expands the query with N-grams obtained from 
Wikipedia hyperlinks.

This talk presents CMU's system and results for the Feed Distillation 
task in the Blog track at TREC 2007.  CMU's group is expected to be one 
of the top performing submissions to the TREC Blog Track this year.

  
-----------------------------------------------------------------------------------
Second Talk: (12:30-1:00)
 
Speaker: Le Zhao and Yangbo Zhu
Title: Structured Queries for Legal Search

Abstract:

This talk reports the experiments of using Indri for the main and 
routing (relevance feedback) tasks in the TREC 2007 Legal Track. For the 
main task, we analyze ranking algorithms using different fields, boolean 
constraints and structured operators. Evaluation results show that 
structured queries outperform bag-of-words ones. Boolean constraints 
improve both precision and recall. For the routing task, we train a 
linear SVM classifier for each topic. Terms with the largest weights are 
selected to form new queries. Both keywords and simple structured 
features (term.field) have been investigated. Named-Entity tags, 
LingPipe sentence breaker and metadata fields of the original documents 
are used to generate the field information. Results show that structured 
features and weighted queries improves retrieval, but only marginally. 
We also show which structures are more useful. It turns out metadata 
fields are not as important as what we thought.


   See you there!

  Grace, Jon, Jaime



More information about the Ir-series mailing list