Language and Task Independent Text Categorization with Simple Language Models

14 years 4 months ago

Download acl.ldc.upenn.edu

We present a simple method for language independent and task independent text categorization learning, based on character-level n-gram language models. Our approach uses simple information theoretic principles and achieves effective performance across a variety of languages and tasks without requiring feature selection or extensive pre-processing. To demonstrate the language and task independence of the proposed technique, we present experimental results on several languages—Greek, English, Chinese and Japanese—in several text categorization problems—language identiﬁcation, authorship attribution, text genre classiﬁcation, and topic detection. Our experimental results show that the simple approach achieves state of the art performance in each case.

Fuchun Peng, Dale Schuurmans, Shaojun Wang

Real-time Traffic

Character-level N-gram Language | Independent Text Categorization | NAACL 2003 | NAACL 2007 | Text Categorization |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	NAACL
Authors	Fuchun Peng, Dale Schuurmans, Shaojun Wang

Comments (0)

Sciweavers

Language and Task Independent Text Categorization with Simple Language Models

Character-level N-gram Language | Independent Text Categorization | NAACL 2003 | NAACL 2007 | Text Categorization |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers