Classification-aware hidden-web text database selection

13 years 11 months ago

Download archive.nyu.edu

Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over multiple such "hidden-web" text databases at once through a unified query interface. An important step in the metasearching process is database selection, or determining which databases are the most relevant for a given user query. The state-ofthe-art database selection techniques rely on statistical summaries of the database contents, generally including the database vocabulary and the associated word frequencies. Unfortunately, hidden-web text databases typically do not export such summaries, so previous research has developed algorithms for constructing approximate content summaries from document samples extracted from the databases via querying. We present a novel "focused probing" sampling algorithm that detects the topics covered in a database and adaptively extracts documents that are rep...

Panagiotis G. Ipeirotis, Luis Gravano

Real-time Traffic

Database Selection | Databases | Summaries | TOIS 2008 |

claim paper

Post Info
More Details (n/a)

Added	15 Dec 2010
Updated	15 Dec 2010
Type	Journal
Year	2008
Where	TOIS
Authors	Panagiotis G. Ipeirotis, Luis Gravano

Comments (0)

Sciweavers

Classification-aware hidden-web text database selection

Database Selection | Databases | Summaries | TOIS 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers