

A Corpus-Based Approach to Automatic Compound Extraction

14 years 1 months ago
A Corpus-Based Approach to Automatic Compound Extraction
An automatic compound retrieval method is proposed to extract compounds within a text message. It uses n-gram mutual information, relative frequency count and parts of speech as the features for compound extraction. The problem is modeled as a two-class classification problem based on the distributional characteristics of n-gram tokens in the compound and the non-compound clusters. The recall and precision using the proposed approach are 96.2% and 48.2% for bigram compounds and 96.6% and 39.6% for trigram compounds for a testing corpus of 49,314 words. A significant cutdown in processing time has been observed.
Keh-Yih Su, Ming-Wen Wu, Jing-Shin Chang
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1994
Where ACL
Authors Keh-Yih Su, Ming-Wen Wu, Jing-Shin Chang
Comments (0)