Online stratified sampling: evaluating classifiers at web-scale

13 years 9 months ago

Download www.cs.cmu.edu

Deploying a classifier to large-scale systems such as the web requires careful feature design and performance evaluation. Evaluation is particularly challenging because these large collections frequently change. In this paper we adapt stratified sampling techniques to evaluate the precision of classifiers deployed in large-scale systems. We investigate different types of stratification strategies, and then we derive a new online sampling algorithm that incrementally approximates the theoretical optimal disproportionate sampling strategy. In experiments, the proposed algorithm significantly outperforms both simple random sampling as well as other types of stratified sampling, with an average reduction of about 20% in labeling effort to reach the same confidence and interval-bounds on precision. Categories and Subject Descriptors H.3.4 [Systems and Software]: Performance evaluation (efficiency and effectiveness) General Terms Algorithms, Design, Experimentation Keywords Stratified sampl...

Paul N. Bennett, Vitor R. Carvalho

Real-time Traffic

CIKM 2010 | Information Technology | Large-scale Systems | Performance Evaluation | Stratified Sampling |

claim paper

Post Info
More Details (n/a)

Added	10 Feb 2011
Updated	10 Feb 2011
Type	Journal
Year	2010
Where	CIKM
Authors	Paul N. Bennett, Vitor R. Carvalho

Comments (0)

Sciweavers

Online stratified sampling: evaluating classifiers at web-scale

CIKM 2010 | Information Technology | Large-scale Systems | Performance Evaluation | Stratified Sampling |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers