There is an exploding amount of user-generated content on the Web due to the emergence of "Web 2.0" services, such as Blogger, MySpace, Flickr, and del.icio.us. The part...
Ka Cheung Sia, Junghoo Cho, Yun Chi, Belle L. Tsen...
Domain-specific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsear...
Andrew McCallum, Kamal Nigam, Jason Rennie, Kristi...
Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may...
Traditional science searched for new objects and phenomena that led to discoveries. Tomorrow's science will combine together the large pool of information in scientific archi...
Tanu Malik, Alexander S. Szalay, Tamas Budavari, A...
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...