From jelsas at cs.cmu.edu  Tue Jul  6 13:45:43 2010
From: jelsas at cs.cmu.edu (Jonathan Elsas)
Date: Tue, 6 Jul 2010 13:45:43 -0400
Subject: [IR-Series] Talk this friday: Grace Hui Yang
Message-ID: <AANLkTimWXpLj080U_GHTe2ICwGwbnCAFneLeTDdJMJZn@mail.gmail.com>

Please join us for the IR series talk this Friday.  Lunch will be
provided by Yahoo!

Time:  Noon
Location:  GHC 4405

Speaker: Grace Hui Yang, Language Technologies Institute, School of
Computer Science, CMU

Title: Collecting High Quality Overlapping Labels at Low Cost.

Abstract:
This paper studies quality of human labels used to train search
engines? rankers. Our specific focus is performance improvements
obtained by using overlapping relevance labels, which is by collecting
multiple human judgments for each training sample. The paper explores
whether, when, and for which samples one should obtain overlapping
training labels, as well as how many labels per sample are needed. The
proposed selective labeling scheme collects additional labels only for
a subset of training samples, specifically for those that are labeled
relevant by a judge. Our experiments show that this labeling scheme
improves the NDCG of two Web search rankers on several real-world test
sets, with a low labeling overhead of around 1.4 labels per sample.
This labeling scheme also outperforms several methods of using
overlapping labels, such as simple k-overlap, majority vote, the
highest labels, etc. Finally, the paper presents a study of how many
overlapping labels are needed to get the best improvement in retrieval
accuracy.

This paper is published in Proceedings of the 33th Annual ACM SIGIR
Conference (SIGIR2010), Geneva, Switzerland, July 19-23, 2010.


From jelsas at cs.cmu.edu  Tue Jul 13 14:25:04 2010
From: jelsas at cs.cmu.edu (Jonathan Elsas)
Date: Tue, 13 Jul 2010 14:25:04 -0400
Subject: [IR-Series] Talk this friday: Jaime Arguello
Message-ID: <AANLkTilamN3q932pdFbbfPOlN8ExnKXvYE7m76Rc_iPv@mail.gmail.com>

Join us for another IR series talk this Friday.  There will be
**LUNCH** provided Yahoo!.

Time: Noon
Location: GHC 6501

Speaker: Jaime Arguello (http://www.cs.cmu.edu/~jaime/)

Title: Vertical Selection in the Presence of Unlabeled Verticals.

Vertical aggregation is the task of incorporating results from specialized
search engines or verticals (e.g., images, video, news) into Web search
results.  Vertical selection is the subtask of deciding, given a query,
which verticals, if any, are relevant.  State of the art approaches use
machine learned models to predict which verticals are relevant to a query.
 When trained using a large set of labeled data, a machine learned
vertical selection model outperforms baselines which require no training
data.  Unfortunately, whenever a new vertical is introduced, a costly new
set of editorial data must be gathered.  In this paper, we propose methods
for reusing training data from a set of existing (source) verticals to
learn a predictive model for a new (target) vertical.  We study methods
for learning robust, portable, and adaptive cross-vertical models.
Experiments show the need to focus on different types of features when
maximizing portability (the ability for a single model to make accurate
predictions across multiple verticals) than when maximizing adaptability
(the ability for a single model to make accurate predictions for a
specific vertical).  We demonstrate the efficacy of our methods through
extensive experimentation for 11 verticals.

This is joint work with Fernando Diaz and Jean-Francois Paiement from
Yahoo! Labs and will be presented at SIGIR 2010.

From jelsas at cs.cmu.edu  Fri Jul 16 09:40:40 2010
From: jelsas at cs.cmu.edu (Jonathan Elsas)
Date: Fri, 16 Jul 2010 09:40:40 -0400
Subject: Fwd: [IR-Series] Talk this friday: Jaime Arguello
In-Reply-To: <AANLkTilamN3q932pdFbbfPOlN8ExnKXvYE7m76Rc_iPv@mail.gmail.com>
References: <AANLkTilamN3q932pdFbbfPOlN8ExnKXvYE7m76Rc_iPv@mail.gmail.com>
Message-ID: <AANLkTildhRyrmNgzbcO3_rP2647gmLPx_Xx6Lc4HBwj2@mail.gmail.com>

Reminder -- this talk is today.


---------- Forwarded message ----------
From: Jonathan Elsas <jelsas at cs.cmu.edu>
Date: Tue, Jul 13, 2010 at 2:25 PM
Subject: [IR-Series] Talk this friday: Jaime Arguello
To: ir-series at cs.cmu.edu


Join us for another IR series talk this Friday. ?There will be
**LUNCH** provided Yahoo!.

Time: Noon
Location: GHC 6501

Speaker: Jaime Arguello (http://www.cs.cmu.edu/~jaime/)

Title: Vertical Selection in the Presence of Unlabeled Verticals.

Vertical aggregation is the task of incorporating results from specialized
search engines or verticals (e.g., images, video, news) into Web search
results. ?Vertical selection is the subtask of deciding, given a query,
which verticals, if any, are relevant. ?State of the art approaches use
machine learned models to predict which verticals are relevant to a query.
?When trained using a large set of labeled data, a machine learned
vertical selection model outperforms baselines which require no training
data. ?Unfortunately, whenever a new vertical is introduced, a costly new
set of editorial data must be gathered. ?In this paper, we propose methods
for reusing training data from a set of existing (source) verticals to
learn a predictive model for a new (target) vertical. ?We study methods
for learning robust, portable, and adaptive cross-vertical models.
Experiments show the need to focus on different types of features when
maximizing portability (the ability for a single model to make accurate
predictions across multiple verticals) than when maximizing adaptability
(the ability for a single model to make accurate predictions for a
specific vertical). ?We demonstrate the efficacy of our methods through
extensive experimentation for 11 verticals.

This is joint work with Fernando Diaz and Jean-Francois Paiement from
Yahoo! Labs and will be presented at SIGIR 2010.