What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statist...
Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine t...
This paper uses the URL word breaking task as an example to elaborate what we identify as crucialin designingstatistical natural language processing (NLP) algorithmsfor Web scale ...
Kuansan Wang, Christopher Thrasher, Bo-June Paul H...
Traditionally, search engines have ignored the reading difficulty of documents and the reading proficiency of users in computing a document ranking. This is one reason why Web se...
Kevyn Collins-Thompson, Paul N. Bennett, Ryen W. W...
Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defi...