Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and auto...
In the research area of automatic web information extraction, there is a need for permanent and annotated web page collections enabling objective performance evaluation of differen...
In recent years, many research systems have been proposed to perform data extraction and automation tasks on Web sources. Since most of today’s Web sources are “human-readable...
We describe an adaptive method for extracting records from web pages. Our algorithm combines a weighted tree matching metric with clustering for obtaining data extraction patterns...
Linguists and geographers are more and more interested in route direction documents because they contain interesting motion descriptions and language patterns. A large number of s...
Xiao Zhang, Prasenjit Mitra, Sen Xu, Anuj R. Jaisw...