Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage c...
Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo...
Linguists and geographers are more and more interested in route direction documents because they contain interesting motion descriptions and language patterns. A large number of s...
Xiao Zhang, Prasenjit Mitra, Sen Xu, Anuj R. Jaisw...
Web Directories are repositories of Web pages organized in a hierarchy of topics and sub-topics. In this paper, we present DirectoryRank, a ranking framework that orders the pages...
Vlassis Krikos, Sofia Stamou, Pavlos Kokosis, Alex...
Authoring dynamic web pages is an inherently difficult task. We present DESK, an interactive authoring tool that allows the customization of dynamic page generation procedures wit...
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...