OXPath: little language, little memory, great value

13 years 10 months ago

Download christian.schallhart.net

Data about everything is readily available on the web—but often only accessible through elaborate user interactions. For automated decision support, extracting that data is essential, but infeasible with existing heavy-weight data extraction systems. In this demonstration, we present OXPath, a novel approach to web extraction, with a system that supports informed job selection and integrates information from several diﬀerent web sites. By carefully extending XPath, OXPath exploits its familiarity and provides a light-weight interface, which is easy to use and embed. We highlight how OXPath guarantees optimal page buﬀering, storing only a constant number of pages for non-recursive queries. Categories and Subject Descriptors H.3.5 [Information Storage and Retrieval]: Online Information Services—Web-based services General Terms Languages, Algorithms Keywords Web extraction, web automation, XPath, AJAX

Andrew Jon Sellers, Tim Furche, Georg Gottlob, Gio

Real-time Traffic

Elaborate User Interactions | Heavy-weight Data Extraction | Internet Technology | Keywords Web Extraction | WWW 2011 |

claim paper

Post Info
More Details (n/a)

Added	15 May 2011
Updated	15 May 2011
Type	Journal
Year	2011
Where	WWW
Authors	Andrew Jon Sellers, Tim Furche, Georg Gottlob, Giovanni Grasso, Christian Schallhart

Comments (0)

Sciweavers

OXPath: little language, little memory, great value

Elaborate User Interactions | Heavy-weight Data Extraction | Internet Technology | Keywords Web Extraction | WWW 2011 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers