Testbed for information extraction from deep web

16 years 7 months ago

Download research.microsoft.com

Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1]. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data. Categories and Subject Descriptors H.3.4 [Systems and Software]: Performance evaluation (efficiency and effectiveness) General Terms Experimentation Keywords Deep Web, Meta Search, Testbed, Wrapper

Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, S

Real-time Traffic

Extraction Methods | Internet Technology | Keywords Deep Web | Target Data | WWW 2004 |

claim paper

» Deep web data extraction

» Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web

» Ontologybased information extraction and integration from heterogeneous data sources

» SiteWide Wrapper Induction for Life Science Deep Web Databases

» Towards Deeper Understanding of the Search Interfaces of the Deep Web

» Probe Cluster and Discover Focused Extraction of QAPagelets from the Deep Web

» Deep Annotation for Information Integration

» ISPEnabled Behavioral Ad Targeting without Deep Packet Inspection

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2004
Where	WWW
Authors	Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, Sachio Hirokawa

Comments (0)

Sciweavers

Testbed for information extraction from deep web

Extraction Methods | Internet Technology | Keywords Deep Web | Target Data | WWW 2004 |

Explore & Download

Productivity Tools

Sciweavers