– This paper describes a text categorization approach that is based on a combination of a newly designed text representation with a kNN classifier. The new text document representation explored here is based an unsupervised learning mechanism – a hierarchical structure of Self-Organizing Feature Maps. Through this architecture, a document can be encoded to a sequence of neurons and the corresponding distances to the neurons, while the temporal sequences of words as well as their frequencies are kept. Combining this representation with the power of kNN classifier achieved a good performance (Micro average F1-measure 0.855) on the experimental data set. It shows that this architecture can capture the characteristic temporal sequences of documents/categories which can be used for various text categorization and clustering task s.
Xiao Luo, A. Nur Zincir-Heywood