[IR-Series] Talk this friday: Grace Hui Yang

Jonathan Elsas jelsas at cs.cmu.edu
Tue Jul 6 13:45:43 EDT 2010


Please join us for the IR series talk this Friday.  Lunch will be
provided by Yahoo!

Time:  Noon
Location:  GHC 4405

Speaker: Grace Hui Yang, Language Technologies Institute, School of
Computer Science, CMU

Title: Collecting High Quality Overlapping Labels at Low Cost.

Abstract:
This paper studies quality of human labels used to train search
engines’ rankers. Our specific focus is performance improvements
obtained by using overlapping relevance labels, which is by collecting
multiple human judgments for each training sample. The paper explores
whether, when, and for which samples one should obtain overlapping
training labels, as well as how many labels per sample are needed. The
proposed selective labeling scheme collects additional labels only for
a subset of training samples, specifically for those that are labeled
relevant by a judge. Our experiments show that this labeling scheme
improves the NDCG of two Web search rankers on several real-world test
sets, with a low labeling overhead of around 1.4 labels per sample.
This labeling scheme also outperforms several methods of using
overlapping labels, such as simple k-overlap, majority vote, the
highest labels, etc. Finally, the paper presents a study of how many
overlapping labels are needed to get the best improvement in retrieval
accuracy.

This paper is published in Proceedings of the 33th Annual ACM SIGIR
Conference (SIGIR2010), Geneva, Switzerland, July 19-23, 2010.



More information about the Ir-series mailing list