The techniques of information retrieval and information extraction are complementary, but to date there has been little concrete work aimed at integrating the two. We describe how...
Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep ...
Yasuhiro Yamada, Nick Craswell, Tetsuya Nakatoh, S...
Users prefer to navigate subjects from organized topics in an abundance resources than to list pages retrieved from search engines. We propose a framework to cluster frequent items...
Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...