Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

176

ICTIR
2009
Springer

129views Information Technology» more ICTIR 2009»

Training Data Cleaning for Text Classification

15 years 4 months ago

Training Data Cleaning for Text Classification

Download nmis.isti.cnr.it

Abstract. In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain; strategies are thus needed for maximizing the effectiveness of the resulting classifiers while minimizing the required amount of training effort. Training data cleaning (TDC) consists in devising ranking functions that sort the original training examples in terms of how likely it is that the human annotator has misclassified them, thereby providing a convenient means for the human annotator to revise the training set so as to improve its quality. Working in the context of boosting-based learning methods we present three different techniques for performing TDC and, on two widely used TC benchmarks, evaluate them by their capability of spotting misclassified texts purposefully inserted in the training set.

Andrea Esuli, Fabrizio Sebastiani

Real-time Traffic

Human Annotator | ICTIR 2009 | Information Technology | Original Training Examples | Training |

claim paper

Related Content

» A Lightweight and Efficient Tool for Cleaning Web Pages

» Negative Training Data Can be Harmful to Text Classification

» Sampling the Web as Training Data for Text Classification

» Semantic Smoothing for Bayesian Text Classification with Small Training Data

» Text Classification using the Concept of Association Rule of Data Mining

» Web Page Cleaning for Web Mining through Feature Weighting

» CBC Clustering Based Text Classification Requiring Minimal Labeled Data

» Transductive LSI for Short Text Classification Problems

» Training Paradigms for Correcting Errors in Grammar and Usage

Post Info
More Details (n/a)

Added	19 Feb 2011
Updated	19 Feb 2011
Type	Journal
Year	2009
Where	ICTIR
Authors	Andrea Esuli, Fabrizio Sebastiani

Comments (0)