An ever increasing amount of valuable information is stored in Web databases, "hidden" behind search interfaces. To save the user's effort in manually exploring each database, metasearchers automatically select the most relevant databases to a user's query [2, 5, 16, 21, 27, 18]. In this paper, we focus on the first of the two technical challenges of metasearching, namely database selection. Past research uses a pre-collected summary of each database to estimate its "relevancy" to the query, and in many cases make incorrect database selection. In this paper, we propose two techniques: probabilistic relevancy modelling and adaptive probing. First, we model the relevancy of each database to a given query as a probabilistic distribution, derived by sampling that database. Using the probabilistic model, the user can explicitly specify a desired level of certainty for database selection. The adaptive probing technique decides which and how many databases to co...
Zhenyu Liu, Chang Luo, Junghoo Cho, Wesley W. Chu