Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage c...
Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo...
In this paper, we describe the design of a profile generator toolkit, which aims to automatically create realistic user profiles for a mobile personalized portal service. These pr...
In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish d...
Text extraction from a web image is important for web indexing because the text can contain a key information of the web. This paper presents a method to detect a text with variou...
In this paper, we present a novel indexing technique called Multi-scale Similarity Indexing (MSI) to index image’s multi-features into a single one-dimensional structure. Both f...