Sciweavers

CICLING
2007
Springer

On the Impact of Lexical and Linguistic Features in Genre- and Domain-Based Categorization

14 years 6 months ago
On the Impact of Lexical and Linguistic Features in Genre- and Domain-Based Categorization
Abstract. Classification in genres and domains is a major field of research for Information Retrieval (scientific and technical watch, datamining, etc.) and the selection of appropriate descriptors to characterize and classify texts is particularly crucial to that effect. Most of practical experiments consider that domains are correlated to the content level (words, tokens, lemmas, etc.) and genres to the morphosyntactic or linguistic one (function words, POS, etc.). However, currently used variables are generally not accurate enough to be applied to the categorization task. The present study assesses the impact of the lexical and linguistic levels in the field of genre and domain categorization. The empirical results we obtained demonstrate how important it is to select an appropriate tagset that meets the requirement of the task. The results also assess the efficiency of the linguistic level for both genre- and domain-based categorization.
Guillaume Cleuziou, Céline Poudat
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CICLING
Authors Guillaume Cleuziou, Céline Poudat
Comments (0)