Get another label? improving data quality and data mining using multiple, noisy labelers

16 years 7 months ago

Download pages.stern.nyu.edu

This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simpl...

Victor S. Sheng, Foster J. Provost, Panagiotis G.

Real-time Traffic

Data Mining | KDD 2008 | Less-than-expert Labeling | Low-cost Labeling | Single Labeling |

claim paper

» Improving Quality of Training Data for Learning to Rank Using ClickThrough Data

» Hierarchical Joint Learning Improving Joint Parsing and Named Entity Recognition with NonJ...

» TwoView Transductive Support Vector Machines

» A Probabilistic Framework to Learn from Multiple Annotators with TimeVarying Accuracy

» Exploiting query click logs for utterance domain detection in spoken language understandin...

» SNARE a link analytic system for graph labeling and risk detection

» YouTubeCat Learning to Categorize Wild Web Videos

» Robust Collective Classification with Contextual Dependency Network Models

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2008
Where	KDD
Authors	Victor S. Sheng, Foster J. Provost, Panagiotis G. Ipeirotis

Comments (0)

Sciweavers

Get another label? improving data quality and data mining using multiple, noisy labelers

Data Mining | KDD 2008 | Less-than-expert Labeling | Low-cost Labeling | Single Labeling |

Explore & Download

Productivity Tools

Sciweavers