Many large-scale Web applications that require ranked top-k retrieval are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non...
George Beskales, Marcus Fontoura, Maxim Gurevich, ...
When working on a new project, researchers need to devote a significant amount of time and effort to surveying the relevant literature. This is required in order to gain experti...
A major difference between corporate intranets and the Internet is that in intranets the barrier for users to create web pages is much higher. This limits the amount and quality o...
Pavel A. Dmitriev, Nadav Eiron, Marcus Fontoura, E...
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hier...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...