Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine t...
The problem of measuring similarity between web pages arises in many important Web applications, such as search engines and Web directories. In this paper, we propose a novel neig...
High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. ...
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
A large amount of empirically derived world knowledge is essential for many languageprocessing tasks, to create expectations that can help assess plausibility and guide disambigua...