Sciweavers

CIKM
2010
Springer

Online stratified sampling: evaluating classifiers at web-scale

13 years 9 months ago
Online stratified sampling: evaluating classifiers at web-scale
Deploying a classifier to large-scale systems such as the web requires careful feature design and performance evaluation. Evaluation is particularly challenging because these large collections frequently change. In this paper we adapt stratified sampling techniques to evaluate the precision of classifiers deployed in large-scale systems. We investigate different types of stratification strategies, and then we derive a new online sampling algorithm that incrementally approximates the theoretical optimal disproportionate sampling strategy. In experiments, the proposed algorithm significantly outperforms both simple random sampling as well as other types of stratified sampling, with an average reduction of about 20% in labeling effort to reach the same confidence and interval-bounds on precision. Categories and Subject Descriptors H.3.4 [Systems and Software]: Performance evaluation (efficiency and effectiveness) General Terms Algorithms, Design, Experimentation Keywords Stratified sampl...
Paul N. Bennett, Vitor R. Carvalho
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where CIKM
Authors Paul N. Bennett, Vitor R. Carvalho
Comments (0)