Abstract. Clustering is often considered the most important unsupervised learning problem and several clustering algorithms have been proposed over the years. Many of these algorit...
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambi...
Abstract. Text documents have sparse data spaces, and nearest neighbors may belong to different classes when using current existing proximity measures to describe the correlation ...
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Abstract. In this paper we investigate a general purpose interactive information organization system. The system organizes documents by placing them into 1-, 2-, or 3-dimensional s...