Learning-Free Text Categorization

15 years 12 months ago

Download www.natlang.hcuge.ch

In this paper, we report on the fusion of simple retrieval strategies with thesaural resources in order to perform large-scale text categorization tasks. Unlike most related systems, which rely on training data in order to infer text-to-concept relationships, our approach can be applied with any controlled vocabulary and does not use any training data. The ﬁrst classiﬁcation module uses a traditional vector-space retrieval engine, which has been ﬁne-tuned for the task, while the second classiﬁer is based on regular variations of the concept list. For evaluation purposes, the system uses a sample of MedLine and the Medical Subject Headings (MeSH) terminology as collection of concepts. Preliminary results show that performances of the hybrid system are signiﬁcantly improved as compared to each single system. For top returned concepts, the system reaches performances comparable to machine learning systems, while genericity and scalability issues are clearly in favor of the learn...

Patrick Ruch, Robert H. Baud, Antoine Geissbü

Real-time Traffic