Users prefer to navigate subjects from organized topics in an abundance resources than to list pages retrieved from search engines. We propose a framework to cluster frequent items...
Structured community portals extract and integrate information from raw Web pages to present a unified view of entities and relationships in the community. In this paper we argue...
Pedro DeRose, Warren Shen, Fei Chen 0002, AnHai Do...
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...
The World Wide Web (WWW) has provided us with a plethora of information. However, given its unstructured format, this information is useful mainly to humans and cannot be effectiv...