Search Sciweavers | Sciweavers

115 search results - page 1 / 23

» Training Data Cleaning for Text Classification

177

Voted

ICTIR
2009
Springer

129views Information Technology» more ICTIR 2009»

Training Data Cleaning for Text Classification

15 years 4 months ago

Download nmis.isti.cnr.it

Abstract. In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain; strategies are thus needed for maximizing t...

Andrea Esuli, Fabrizio Sebastiani

claim paper

Read More »

159

click to vote

LREC
2008

108views Education» more LREC 2008»

A Lightweight and Efficient Tool for Cleaning Web Pages

15 years 8 months ago

Download www.lrec-conf.org

Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...

Stefan Evert

claim paper

Read More »

178

click to vote

EMNLP
2010

138views Natural Language Processing» more EMNLP 2010»

Negative Training Data Can be Harmful to Text Classification

15 years 4 months ago

Download www.aclweb.org

This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Tradit...

Xiaoli Li, Bing Liu, See-Kiong Ng

claim paper

Read More »

152

click to vote

IJDLS
2010

108views more IJDLS 2010»

Sampling the Web as Training Data for Text Classification

15 years 3 months ago

Download irlab.csie.ntu.edu.tw

Data acquisition is a major concern in text classification. The excessive human efforts required by conventional methods to build up quality training collection might not always b...

Wei-Yen Day, Chun-Yi Chi, Ruey-Cheng Chen, Pu-Jen ...

claim paper

Read More »

176

click to vote

SDM
2008
SIAM

133views Data Mining» more SDM 2008»

Semantic Smoothing for Bayesian Text Classification with Small Training Data

15 years 8 months ago

Download www.cis.drexel.edu

Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...

Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu

claim paper

Read More »

« Prev « First page 1 / 23 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers