Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
There exist numerous systems for mining the web in search of relevant information but few exist for the discovery of interesting information. The discovery of interesting informat...
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if...
This paper describes the WebCLEF 2007 task. The task definition—which goes beyond traditional navigational queries and is concerned with undirected information search goals—c...
Tags in social tagging systems store meaning for the taggers who have entered them, and other users often share this understanding. The result of this, a folksonomy, is typically ...