Searching for Hidden-Web Databases

16 years 14 hour ago

Download www.cs.utah.edu

Recently, there has been increased interest in the retrieval and integration of hidden Web data with a view to leverage high-quality information available in online databases. Although previous works have addressed many aspects of the actual integration, including matching form schemata and automatically ﬁlling out forms, the problem of locating relevant data sources has been largely overlooked. Given the dynamic nature of the Web, where data sources are constantly changing, it is crucial to automatically discover these resources. However, considering the number of documents on the Web (Google already indexes over 8 billion documents), automatically ﬁnding tens, hundreds or even thousands of forms that are relevant to the integration task is really like looking for a few needles in a haystack. Besides, since the vocabulary and structure of forms for a given domain are unknown until the forms are actually found, it is hard to deﬁne exactly what to look for. We propose a new crawl...

Luciano Barbosa, Juliana Freire

Real-time Traffic