Extracting Route Directions from Web Pages

16 years 1 months ago

Download webdb09.cse.buffalo.edu

Linguists and geographers are more and more interested in route direction documents because they contain interesting motion descriptions and language patterns. A large number of such documents can be easily found on the Internet. A challenging task is to automatically extract meaningful route parts, i.e. destinations, origins and instructions, from route direction documents. However, no work exists on this issue. In this paper, we introduce our eﬀort toward this goal. Based on our observation that sentences are the basic units for route parts, we extract sentences from HTML documents using both the natural language knowledge and HTML tag information. Additionally, we study the sentence classiﬁcation problem in route direction documents and its sequential nature. Several machine learning methods are compared and analyzed. The impacts of diﬀerent sets of features are studied. Based on the obtained insights, we propose to use sequence labelling models such as CRFs and MEMMs and the...

Xiao Zhang, Prasenjit Mitra, Sen Xu, Anuj R. Jaisw

Real-time Traffic

Internet Technology | Meaningful Route Parts | Route Direction Documents | Route Parts | WEBDB 2009 |

claim paper

» Web page title extraction and its application

» Learning PageIndependent Heuristics for Extracting Data from Web Pages

» Data Extraction from Web Data Sources

» On the Automatic Extraction of Data from the Hidden Web

» Syntactic Folding and its Application to the Information Extraction from Web Pages

» Data Extraction from Web Database Query Result Pages via Tagsets and Integer Sequences

» Using visual cues for extraction of tabular data from arbitrary HTML documents

» Silk from a Sows Ear Extracting Usable Structures from the Web

Post Info
More Details (n/a)

Added	25 May 2010
Updated	25 May 2010
Type	Conference
Year	2009
Where	WEBDB
Authors	Xiao Zhang, Prasenjit Mitra, Sen Xu, Anuj R. Jaiswal, Alex Klippel, Alan M. MacEachren

Comments (0)

Sciweavers

Extracting Route Directions from Web Pages

Internet Technology | Meaningful Route Parts | Route Direction Documents | Route Parts | WEBDB 2009 |

Explore & Download

Productivity Tools

Sciweavers