In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hier...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
An approximate search query on a collection of strings finds those strings in the collection that are similar to a given query string, where similarity is defined using a given si...
As peer-to-peer (P2P) networks become more familiar to the database community, intense interest has built up in using their scalability and resilience properties to scale database...
Given a dataset P and a preference function f, a top-k query retrieves the k tuples in P with the highest scores according to f. Even though the problem is well-studied in convent...