Distributed query sampling: a quality-conscious approach

16 years 26 days ago

Download www.cc.gatech.edu

We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based sampling process into an initial seed sampling phase and a quality-aware iterative sampling phase. In the second phase the sampling process is dynamically scheduled based on estimated database size and quality parameters derived during the previous sampling process. The unique characteristic of our adaptive query-based sampling framework is its self-learning and self-conﬁguring ability based on the overall quality of all text databases under consideration. We introduce three quality-conscious sampling schemes for estimating database quality, and our initial results show that the proposed framework supports higher-quality document sampling than existing approaches. Categories and Subject Descriptors: H.3.3 [Information Search and Retrieval]; H.3.1 [Content Analysis and Indexing] General Terms: Algorithms, Experi...

James Caverlee, Ling Liu, Joonsoo Bae

Real-time Traffic