Sciweavers

SIGIR
2011
ACM

When documents are very long, BM25 fails!

13 years 2 months ago
When documents are very long, BM25 fails!
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namely BM25L, which “shifts” the term frequency normalization formula to boost scores of very long documents. Our experiments show that BM25L, with the same computation cost, is more effective and robust than the standard BM25. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval models General Terms Algorithms Keywords BM25, BM25L, term frequency, very long documents
Yuanhua Lv, ChengXiang Zhai
Added 17 Sep 2011
Updated 17 Sep 2011
Type Journal
Year 2011
Where SIGIR
Authors Yuanhua Lv, ChengXiang Zhai
Comments (0)