We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
In TREC-10, Microsoft Research Asia (MSRA) participated in the Web track (ad hoc retrieval task and homepage finding task). The latest version of the Okapi system (Windows 2000 ve...
Jianfeng Gao, Guihong Cao, Hongzhao He, Min Zhang,...
In this paper we report on our natural language information retrieval (NLIR) project as related to the recently concluded 5th Text Retrieval Conference (TREC-5). The main thrust o...
Tomek Strzalkowski, Fang Lin, Jose Perez Carballo,...
The field of information retrieval has witnessed over 50 years of research on retrieval methods for metadata descriptions and controlled indexing languages, the prototypical exam...
Current web search engines return result pages containing mostly text summary even though the matched web pages may contain informative pictures. A text excerpt (i.e. snippet) is ...