Sciweavers

WWW
2011
ACM

OXPath: little language, little memory, great value

13 years 6 months ago
OXPath: little language, little memory, great value
Data about everything is readily available on the web—but often only accessible through elaborate user interactions. For automated decision support, extracting that data is essential, but infeasible with existing heavy-weight data extraction systems. In this demonstration, we present OXPath, a novel approach to web extraction, with a system that supports informed job selection and integrates information from several different web sites. By carefully extending XPath, OXPath exploits its familiarity and provides a light-weight interface, which is easy to use and embed. We highlight how OXPath guarantees optimal page buffering, storing only a constant number of pages for non-recursive queries. Categories and Subject Descriptors H.3.5 [Information Storage and Retrieval]: Online Information Services—Web-based services General Terms Languages, Algorithms Keywords Web extraction, web automation, XPath, AJAX
Andrew Jon Sellers, Tim Furche, Georg Gottlob, Gio
Added 15 May 2011
Updated 15 May 2011
Type Journal
Year 2011
Where WWW
Authors Andrew Jon Sellers, Tim Furche, Georg Gottlob, Giovanni Grasso, Christian Schallhart
Comments (0)