From jelsas+ at cs.cmu.edu Mon Jul 13 17:09:03 2009 From: jelsas+ at cs.cmu.edu (Jonathan Elsas) Date: Mon, 13 Jul 2009 17:09:03 -0400 Subject: [IR Series] - Jaime Arguello - Thursday July 16th, 2009, 11:00 AM - Wean Hall 7220 In-Reply-To: <4A1C0E1D.2090309@cs.cmu.edu> References: <4A1C0E1D.2090309@cs.cmu.edu> Message-ID: Hello -- Please join us for our an IR series talk this Thursday. NOTE the different time & location. Speaker: Jaime Arguello (LTI, CMU) Time & Date: Thursday July 16th, 2009, 11:00 AM Place: Wean Hall 7220 Lunch will be provided by Yahoo! Title: Sources of Evidence for Vertical Selection Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to a search engine's main web search page. In contrast to prior collection selection tasks, vertical selection is associated with unique resources that can inform the classificationdecision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued to the vertical directly by users, and (3) corpora representative of vertical content. These sources of evidence are integrated as features in a classification-based approach. We make use of and compare against prior work in federated search and retrieval effectiveness prediction. Our evaluation focuses on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work. Based on work conducted at Yahoo! Labs Montreal to be presented at SIGIR 2009. Thanks, Jon, Jaime & Grace From jelsas+ at cs.cmu.edu Tue Jul 14 15:34:47 2009 From: jelsas+ at cs.cmu.edu (Jonathan Elsas) Date: Tue, 14 Jul 2009 15:34:47 -0400 Subject: [IR Series] - Jaime Arguello & Pinar Donmez - Thursday July 16th, 2009, 11:00 AM - Wean Hall 7220 In-Reply-To: References: <4A1C0E1D.2090309@cs.cmu.edu> Message-ID: UPDATE: We will have 2 talks at this IR series, both Jaime and Pinar will be presenting their work in preparation for SIGIR next week. Speaker: Pinar Donmez Time & Date: see below Title: On the Local Optimality of LambdaRank A machine learning approach to rank learning trains a model to optimize a target evaluation measure with repect to train- ing data. Currently, existing information retrieval measures are impossible to optimize directly except for models with a trivial number of parameters. The IR community thus faces a major challenge: how to optimize IR measures of interest directly. In this paper, we present a solution. Specifically, we show that LambdaRank [1], which smoothly approxi- mates the gradient of the target measure, can be adapted to work with three popular IR target evaluation measures using the same underlying gradient construction. It is likely, therefore, that this construction is extendable to other eval- uation measures. We empirically show that LambdaRank finds a locally optimal solution for NDCG, MAP and MRR with a 99% confidence rate. We also show that the amount of effective training data varies with IR measure and that with a sufficiently large training set size, matching the train- ing optimization measure to the target evaluation measure yields the best accuracy. This work is conducted jointly with Krysta Svore and Chris Burges while interning at MSR Redmond. It will be presented at SIGIR '09. On Jul 13, 2009, at 5:09 PM, Jonathan Elsas wrote: > Hello -- Please join us for our an IR series talk this Thursday. > NOTE the different time & location. > > Speaker: Jaime Arguello (LTI, CMU) > Time & Date: Thursday July 16th, 2009, 11:00 AM > Place: Wean Hall 7220 > > Lunch will be provided by Yahoo! > > Title: Sources of Evidence for Vertical Selection > > Web search providers often include search services for domain- > specific subcollections, called verticals, such as news, images, > videos, job postings, company summaries, and artist profiles. We > address the problem of vertical selection, predicting relevant > verticals (if any) for queries issued to a search engine's main web > search page. In contrast to prior collection selection tasks, > vertical selection is associated with unique resources that can > inform the classificationdecision. We focus on three sources of > evidence: (1) the query string, from which features are derived > independent of external resources, (2) logs of queries previously > issued to the vertical directly by users, and (3) corpora > representative of vertical content. These sources of evidence are > integrated as features in a classification-based approach. We make > use of and compare against prior work in federated search and > retrieval effectiveness prediction. Our evaluation focuses on 18 > different verticals, which differ in terms of semantics, media type, > size, and level of query traffic. An in-depth error analysis reveals > unique challenges across different verticals and provides insight > into vertical selection for future work. > > Based on work conducted at Yahoo! Labs Montreal to be presented at > SIGIR 2009. > > > Thanks, > > Jon, Jaime & Grace > > From jelsas+ at cs.cmu.edu Thu Jul 16 11:05:03 2009 From: jelsas+ at cs.cmu.edu (Jonathan Elsas) Date: Thu, 16 Jul 2009 11:05:03 -0400 Subject: [IR Series] - Jaime Arguello & Pinar Donmez - Thursday July 16th, 2009, 11:00 AM - Wean Hall 7220 In-Reply-To: References: <4A1C0E1D.2090309@cs.cmu.edu> Message-ID: REMINDER: IR Series talk NOW. WEH 7220 On Jul 14, 2009, at 3:34 PM, Jonathan Elsas wrote: > UPDATE: We will have 2 talks at this IR series, both Jaime and > Pinar will be presenting their work in preparation for SIGIR next > week. > > Speaker: Pinar Donmez > Time & Date: see below > Title: On the Local Optimality of LambdaRank > > A machine learning approach to rank learning trains a model > to optimize a target evaluation measure with repect to train- > ing data. Currently, existing information retrieval measures > are impossible to optimize directly except for models with a > trivial number of parameters. The IR community thus faces > a major challenge: how to optimize IR measures of interest > directly. In this paper, we present a solution. Specifically, > we show that LambdaRank [1], which smoothly approxi- > mates the gradient of the target measure, can be adapted > to work with three popular IR target evaluation measures > using the same underlying gradient construction. It is likely, > therefore, that this construction is extendable to other eval- > uation measures. We empirically show that LambdaRank > finds a locally optimal solution for NDCG, MAP and MRR > with a 99% confidence rate. We also show that the amount > of effective training data varies with IR measure and that > with a sufficiently large training set size, matching the train- > ing optimization measure to the target evaluation measure > yields the best accuracy. > > This work is conducted jointly with Krysta Svore and Chris Burges > while > interning at MSR Redmond. It will be presented at SIGIR '09. > > > > On Jul 13, 2009, at 5:09 PM, Jonathan Elsas wrote: > >> Hello -- Please join us for our an IR series talk this Thursday. >> NOTE the different time & location. >> >> Speaker: Jaime Arguello (LTI, CMU) >> Time & Date: Thursday July 16th, 2009, 11:00 AM >> Place: Wean Hall 7220 >> >> Lunch will be provided by Yahoo! >> >> Title: Sources of Evidence for Vertical Selection >> >> Web search providers often include search services for domain- >> specific subcollections, called verticals, such as news, images, >> videos, job postings, company summaries, and artist profiles. We >> address the problem of vertical selection, predicting relevant >> verticals (if any) for queries issued to a search engine's main web >> search page. In contrast to prior collection selection tasks, >> vertical selection is associated with unique resources that can >> inform the classificationdecision. We focus on three sources of >> evidence: (1) the query string, from which features are derived >> independent of external resources, (2) logs of queries previously >> issued to the vertical directly by users, and (3) corpora >> representative of vertical content. These sources of evidence are >> integrated as features in a classification-based approach. We make >> use of and compare against prior work in federated search and >> retrieval effectiveness prediction. Our evaluation focuses on 18 >> different verticals, which differ in terms of semantics, media >> type, size, and level of query traffic. An in-depth error analysis >> reveals unique challenges across different verticals and provides >> insight into vertical selection for future work. >> >> Based on work conducted at Yahoo! Labs Montreal to be presented at >> SIGIR 2009. >> >> >> Thanks, >> >> Jon, Jaime & Grace >> >> > >