Enterprise and web data processing and content aggregation systems often require extensive use of human-reviewed data (e.g. for training and monitoring machine learning-based applications). Today these needs are often met by in-house efforts or out-sourced offshore contracting. Emerging applications attempt to provide automated collection of humanreviewed data at Internet-scale. We conduct extensive experiments to study the effectiveness of one such application. We also study the feasibility of using Yahoo! Answers, a general question-answering forum, for human-reviewed data collection. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; J.4 [Social and Behavioral Sciences] General Terms Experimentation, Measurement, Human Factors Keywords human data, manual review, data collection
Qi Su, Dmitry Pavlov, Jyh-Herng Chow, Wendell C. B