

Capturing collection size for distributed non-cooperative retrieval

14 years 6 months ago
Capturing collection size for distributed non-cooperative retrieval
Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not available, the size of the underlying collections must be estimated. While several approaches for the estimation of collection size have been proposed, their accuracy has not been thoroughly evaluated. An empirical analysis of past estimation approaches across a variety of collections demonstrates that their prediction accuracy is low. Motivated by ecological techniques for the estimation of animal populations, we propose two new approaches for the estimation of collection size. We show that our approaches are significantly more accurate that previous methods, and are more efficient in use of resources required to perform the estimation. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Selection Process; H.3.4 [Systems and software]: Distributed Systems; H.3.7 [Digital Libraries]: ...
Milad Shokouhi, Justin Zobel, Falk Scholer, Seyed
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Authors Milad Shokouhi, Justin Zobel, Falk Scholer, Seyed M. M. Tahaghoghi
Comments (0)