This paper investigates the applicability of distributed clustering technique, called RACHET [1], to organize large sets of distributed text data. Although the authors of RACHET c...
This paper describes the results of some experiments exploring statistical methods to infer syntactic categories from a raw corpus in an unsupervised fashion. It shares certain po...
Model M, a novel class-based exponential language model, has been shown to significantly outperform word n-gram models in state-of-the-art machine translation and speech recognit...
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the cho...
After extracting terms from a corpus of titles and s in English, syntactic variation relations are identified amongst them in order to detect research topics. Three types of synta...