Sciweavers

CICLING
2008
Springer

Unsupervised and Knowledge-Free Learning of Compound Splits and Periphrases

14 years 2 months ago
Unsupervised and Knowledge-Free Learning of Compound Splits and Periphrases
Abstract. We present an approach for knowledge-free and unsupervised recognition of compound nouns for languages that use one-wordcompounds such as Germanic and Scandinavian languages. Our approach works by creating a candidate list of compound splits based on the word list of a large corpus. Then, we filter this list using the following criteria: (a) frequencies of compounds and parts, (b) length of parts. In a second step, we search the corpus for periphrases, that is a reformulation of the (single-word) compound using the parts and very high frequency words (which are usually prepositions or determiners). This step excludes spurious candidate splits at cost of recall. To increase recall again, we train a trie-based classifier that also allows splitting multipart-compounds iteratively. We evaluate our method for both steps and with various parameter settings for German against a manually created gold standard, showing promising results above 80% precision for the splits and about hal...
Florian Holz, Chris Biemann
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Where CICLING
Authors Florian Holz, Chris Biemann
Comments (0)