— With the ever-increasing number of digital documents, the ability to automatically classifying those documents both quickly and accurately is becoming more critical and difficult. We present Fast Algorithm for Categorizing Text (FACT), which is a statistical based multi-way classifier with our proposed feature selection, Ambiguity measure(AM), that uses only the most unambiguous keywords to predict the category of a document. Our empirical results show that FACT outperforms the best results on the best performing feature selection for the Naïve Bayes classifier namely, Odds Ratio. We empirically show the effectiveness of our approach in outperforming Odds Ratio using four benchmark datasets with a statistical significance of 99% confidence level. Furthermore, the performance of FACT is comparable or better than current non-statistical based classifiers.
Saket S. R. Mengle, Nazli Goharian, Alana Platt