Semantic Smoothing for Bayesian Text Classification with Small Training Data

14 years 5 months ago

Download www.cis.drexel.edu

Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian smoothing and corpus-based background smoothing are not effective in handling it. Instead, we propose a novel semantic smoothing method to address the sparse problem. Our method extracts explicit topic signatures (e.g. words, multiword phrases, and ontologybased concepts) from a document and then statistically maps them into single-word features. We conduct comprehensive experiments on three testing collections (OHSUMED, LATimes, and 20NG) to compare semantic smoothing with other approaches. When the size of training documents is small, the bayesian classifier with semantic smoothing not only outperforms the classifiers with background smoothing and Laplacian smoothing, but also beats the state-of-the-art active learning classifiers and SVM classifiers. In this paper, we also compare three types of topic sig...

Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu

Real-time Traffic

Background Smoothing | Data Mining | Laplacian Smoothing | SDM 2008 | Semantic Smoothing |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2008
Where	SDM
Authors	Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu

Comments (0)

Sciweavers

Semantic Smoothing for Bayesian Text Classification with Small Training Data

Background Smoothing | Data Mining | Laplacian Smoothing | SDM 2008 | Semantic Smoothing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers