A machine learning approach for improved BM25 retrieval

16 years 1 months ago

Download research.microsoft.com

Despite the widespread use of BM25, there have been few studies examining its eﬀectiveness on a document description over single and multiple ﬁeld combinations. We determine the eﬀectiveness of BM25 on various document ﬁelds. We ﬁnd that BM25 models relevance on popularity ﬁelds such as anchor text and query click information no better than a linear function of the ﬁeld attributes. We also ﬁnd query click information to be the single most important ﬁeld for retrieval. In response, we develop a machine learning approach to BM25-style retrieval that learns, using LambdaRank, from the input attributes of BM25. Our model signiﬁcantly improves retrieval eﬀectiveness over BM25 and BM25F. Our data-driven approach is fast, eﬀective, avoids the problem of parameter tuning, and can directly optimize for several common information retrieval measures. We demonstrate the advantages of our model on a very large real-world Web data collection. Categories and Subject Descripto...

Krysta Marie Svore, Christopher J. C. Burges

Real-time Traffic