In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema an...
Mohammed Kayed, Chia-Hui Chang, Khaled F. Shaalan,...
Collaborative work on unstructured or semistructured documents, such as in literature corpora or source code, often involves agreed upon templates containing metadata. These templ...
Variable Data Printing (VDP) has brought new flexibility and dynamism to the printed page. Each printed instance of a specific class of document can now have different degrees of ...
Steven R. Bagley, David F. Brailsford, James A. Ol...
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...
Many web sites contain large sets of pages generated using a common template or layout. For example, Amazon lays out the author, title, comments, etc. in the same way in all its b...