

Testbed for information extraction from deep web

15 years 4 months ago
Testbed for information extraction from deep web
Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1]. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data. Categories and Subject Descriptors H.3.4 [Systems and Software]: Performance evaluation (efficiency and effectiveness) General Terms Experimentation Keywords Deep Web, Meta Search, Testbed, Wrapper
Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, S
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2004
Where WWW
Authors Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, Sachio Hirokawa
Comments (0)