Sciweavers

SIGIR
2006
ACM

Distributed query sampling: a quality-conscious approach

14 years 5 months ago
Distributed query sampling: a quality-conscious approach
We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based sampling process into an initial seed sampling phase and a quality-aware iterative sampling phase. In the second phase the sampling process is dynamically scheduled based on estimated database size and quality parameters derived during the previous sampling process. The unique characteristic of our adaptive query-based sampling framework is its self-learning and self-configuring ability based on the overall quality of all text databases under consideration. We introduce three quality-conscious sampling schemes for estimating database quality, and our initial results show that the proposed framework supports higher-quality document sampling than existing approaches. Categories and Subject Descriptors: H.3.3 [Information Search and Retrieval]; H.3.1 [Content Analysis and Indexing] General Terms: Algorithms, Experi...
James Caverlee, Ling Liu, Joonsoo Bae
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where SIGIR
Authors James Caverlee, Ling Liu, Joonsoo Bae
Comments (0)