Sciweavers

ICDE
2006
IEEE

Query Selection Techniques for Efficient Crawling of Structured Web Sources

15 years 27 days ago
Query Selection Techniques for Efficient Crawling of Structured Web Sources
The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are only accessible through Web query forms or via Web service interfaces. Recent research efforts have been focusing on understanding these Web query forms. A critical but still largely unresolved question is: how to efficiently acquire the structured information inside Web databases through iteratively issuing meaningful queries? In this paper we focus on the central issue of enabling efficient Web database crawling through query selection, i.e. how to select good queries to rapidly harvest data records from Web databases. We model each structured Web database as a distinct attribute-value graph. Under this theoretical framework, the database crawling problem is transformed into a graph traversal one that follows "relational" links. We show that finding an optimal query selection plan is equivalent to f...
Ping Wu, Ji-Rong Wen, Huan Liu, Wei-Ying Ma
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2006
Where ICDE
Authors Ping Wu, Ji-Rong Wen, Huan Liu, Wei-Ying Ma
Comments (0)