[IR Series] Two upcoming IR talks on Wed, Oct 13, 2010
Grace Hui Yang
huiyang at cs.cmu.edu
Wed Oct 6 16:43:44 EDT 2010
Hi,
Please join us for two upcoming IR series talks on Wednesday, Oct 13, 2010.
Lunch will be provided by Yahoo!.
Date/Time: Wednesday, Oct 13, 2010, noon
Place: GHC 4405
First Speaker: Le Zhao
Title: Term Necessity Prediction
Abstract:
The probability that a term appears in relevant documents (P(t|R)) is
a fundamental quantity in several probabilistic retrieval models,
however it is difficult to estimate without relevance judgments or a
relevance model. We call this value term necessity because it measures
the percentage of relevant documents retrieved by the term – how
necessary a term’s occurrence is to document relevance. Prior research
typically either set this probability to a constant, or estimated it
based on the term's inverse document frequency, neither of which was
very effective.
This paper identifies several factors that affect term necessity, for
example, a term’s topic centrality, synonymy and abstractness. It
develops term- and query-dependent features for each factor that
enable supervised learning of a predictive model of term necessity
from training data. Experiments with two popular retrieval models and
6 standard datasets demonstrate that using predicted term necessity
estimates as user term weights for the original query terms leads to
significant improvements in retrieval accuracy.
The paper will appear in CIKM 2010.
----------------------------------------------------
Second Speaker: Jon Elsas
Title: Rank Learning for Factoid Question Answering with Linguistic and
Semantic Constraints
Abstract:
This work presents a general rank-learning framework for leveraging deep
linguistic and semantic features for passage ranking within Question
Answering (QA) systems. The passage ranking framework enables query-time
checking of these complex and long-distance constraints among question
features such as keywords and named entities. These constraints can
include keyword ordering, annotation type-checking, verb-argument
attachment and arbitrary long-distance paths through an annotation
graph. We show that a trained ranking model using this rich feature set
achieves greater than a 20% improvement in Mean Average Precision over
baseline keyword retrieval models. We also show that for questions
expressing the most complex linguistic semantic constraints, further
gains in MAP are realized, yielding a 40% improvement over the baseline.
The paper will appear in CIKM 2010.
More information about the Ir-series
mailing list