Sciweavers

ROCAI
2004
Springer

Learning Interestingness Measures in Terminology Extraction. A ROC-based approach

14 years 5 months ago
Learning Interestingness Measures in Terminology Extraction. A ROC-based approach
Abstract. In the field of Text Mining, a key phase in data preparation is concerned with the extraction of terms, i.e. collocation of words attached to specific concepts (e.g. Philosophy-Dissertation). In this paper, Term Extraction is formalized as a supervised learning task, extracting a ranking hypothesis from a set of terms labeled as relevant/irrelevant by the expert. This task is tackled using the evolutionary algorithm ROGER, optimizing the area under the ROC curve attached to a ranking hypothesis. Empirical validation on two real-world applications demonstrates outstanding improvements compared to state-of-art interestingness measures in Term Extraction. The approach is found robust across domains (Molecular Biology, Curriculum Vitæ) and languages (English, French).
Mathieu Roche, Jérôme Azé, Yve
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where ROCAI
Authors Mathieu Roche, Jérôme Azé, Yves Kodratoff, Michèle Sebag
Comments (0)