Sciweavers

SMC
2007
IEEE

Text categorization based on the ratio of word frequency in each categories

14 years 6 months ago
Text categorization based on the ratio of word frequency in each categories
— In the present paper, we consider the automatic text categorization as a series of information processing and propose a new classification technique called the Frequency Ratio Accumulation Method (FRAM). This is a simple technique that calculates the sum of ratios of word frequency in each category. However, in FRAM, feature terms can be used without limit. Therefore, we propose the use of the character N-gram and the word N-gram as feature terms using the above-described property of FRAM. Next, we evaluate the proposed technique through a number of experiments. In these experiments, we classify newspaper articles from Japanese CD-Mainichi 2002 and English Reuters-21578 using the Naive Bayes method (baseline method) and the proposed method. As a result, we show that the classification accuracy of the proposed method is far better than that of the baseline method. Specifically, the classification accuracy of the proposed method is 87.3% for Japanese CD-Mainichi 2002 and 86.1% for En...
Makoto Suzuki, Shigeichi Hirasawa
Added 04 Jun 2010
Updated 04 Jun 2010
Type Conference
Year 2007
Where SMC
Authors Makoto Suzuki, Shigeichi Hirasawa
Comments (0)