Sciweavers

VLDB
1998
ACM

Determining Text Databases to Search in the Internet

14 years 4 months ago
Determining Text Databases to Search in the Internet
Text data in the Internet can be partitioned into many databases naturally. Efficient retrieval of desired data can be achieved if we can accurately predict the usefulness of each database, because with such information, we only need to retrieve potentially useful documents from useful databases. In this paper, we propose two new methods for estimating the usefulness of text databases. For a given query, the usefulness of a text database in this paper is defined to be the number of documents in the database that are sufficiently similar to the query. Such a usefulness measure enables naive-users to make informed decision about which databases to search. We also consider the collection fusion problem. Because local databases may employ similarity functions that are different from that used by the global database, the threshold used by a local database to determine whether a document is potentially useful may be different from that used by the global database. We provide techniques that...
Weiyi Meng, King-Lup Liu, Clement T. Yu, Xiaodong
Added 06 Aug 2010
Updated 06 Aug 2010
Type Conference
Year 1998
Where VLDB
Authors Weiyi Meng, King-Lup Liu, Clement T. Yu, Xiaodong Wang, Yuhsi Chang, Naphtali Rishe
Comments (0)