Query substitution is an important problem in information retrieval. Much work focuses on how to find substitutes for any given query. In this paper, we study how to efficiently ...
The dominant method for evaluating search engines is the Cranfield paradigm, but the existing metrics do not consider some modern search engines features, such as document snippets...
Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, ...
Statistical machine learning techniques for data classification usually assume that all entities are i.i.d. (independent and identically distributed). However, real-world entities...
In this paper, we propose a new method of citation data clustering for author name disambiguation. Most citation data appearing in the reference section of scientific papers incl...