This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items...
Lixto is a system and method for the visual and interactive generation of wrappers for Web pages under the supervision of a human developer, for automatically extracting informatio...
Documents in the Web are often organized using category trees by information providers (e.g. CNN, BBC) or search engines (e.g. Google, Yahoo!). Such category trees are commonly kn...
A great deal of information on the Web is represented in both textual and structured form. The structured form is machinereadable and can be used to augment the textual data. We c...
Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users' queries. The majority of these documents are genera...
Yih-Ling Hedley, Muhammad Younas, Anne E. James, M...