[IR Series] - TREC 2007 talks - Friday, Nov 2 - 12:00 (noon), NSH 3002
Grace Hui Yang
huiyang at cs.cmu.edu
Wed Oct 31 13:52:54 EDT 2007
Hi,
There will be two talks this year's TREC given in this week's IR series.
Lunch will be provided by Yahoo!
Time: Friday, Nov 2, 12:00-1:00pm
Location: NSH 3002
------------------------------------------------------------------------------------
First Talk: (12:00-12:30)
Speaker: Jonathan Elsas
Title: CMU at the TREC 07 Blog Track: Retrieval and Feedback Models
for Blog Distillation
Abstract:
Feed distillation (or ``feed search") is the task of finding blog feeds
with a principle, recurring interest in X, where X is some information
need expressed as a query. Thus, the input to the system is a query
and the output is ranked list of blog feeds. Tailoring a system for feed
search requires making several design decisions. In this work, we
explored the following: (1) Is it most effective to treat this task as
feed retrieval, viewing each feed as a single document; or entry
retrieval, where ranked entries are aggregated into an overall feed
ranking? (2) How can query expansion be appropriately performed for this
task? Two different approaches are compared. The first one is based on
pseudo-relevance feedback using the target collection. The second is a
simple novel technique that expands the query with N-grams obtained from
Wikipedia hyperlinks.
This talk presents CMU's system and results for the Feed Distillation
task in the Blog track at TREC 2007. CMU's group is expected to be one
of the top performing submissions to the TREC Blog Track this year.
-----------------------------------------------------------------------------------
Second Talk: (12:30-1:00)
Speaker: Le Zhao and Yangbo Zhu
Title: Structured Queries for Legal Search
Abstract:
This talk reports the experiments of using Indri for the main and
routing (relevance feedback) tasks in the TREC 2007 Legal Track. For the
main task, we analyze ranking algorithms using different fields, boolean
constraints and structured operators. Evaluation results show that
structured queries outperform bag-of-words ones. Boolean constraints
improve both precision and recall. For the routing task, we train a
linear SVM classifier for each topic. Terms with the largest weights are
selected to form new queries. Both keywords and simple structured
features (term.field) have been investigated. Named-Entity tags,
LingPipe sentence breaker and metadata fields of the original documents
are used to generate the field information. Results show that structured
features and weighted queries improves retrieval, but only marginally.
We also show which structures are more useful. It turns out metadata
fields are not as important as what we thought.
See you there!
Grace, Jon, Jaime
More information about the Ir-series
mailing list