Generalising multiple capture-recapture to non-uniform sample sizes

14 years 5 months ago

Download es.csiro.au

Algorithms in distributed information retrieval often rely on accurate knowledge of the size of a collection. The "multiple capture-recapture" method of Shokouhi et al. is one of the more reliable algorithms for determining collection size, but it relies on samples with a uniform number of documents. Such uniform samples are often hard to obtain in a working system. A simple generalisation of multiple capture-recapture does not rely on uniform sample sizes. Simulations show it is as accurate as the original method even when sample sizes vary considerably, making it a useful technique in real tools. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Software--distributed systems General Terms Experimentation, Measurement Keywords Size estimation

Paul Thomas

Real-time Traffic

Information Technology | Multiple Capture-recapture | SIGIR 2008 | Such Uniform Samples | Uniform Sample |

claim paper

Post Info
More Details (n/a)

Added	15 Dec 2010
Updated	15 Dec 2010
Type	Journal
Year	2008
Where	SIGIR
Authors	Paul Thomas

Comments (0)

Sciweavers

Generalising multiple capture-recapture to non-uniform sample sizes

Information Technology | Multiple Capture-recapture | SIGIR 2008 | Such Uniform Samples | Uniform Sample |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers