Unsupervised and Knowledge-Free Learning of Compound Splits and Periphrases

15 years 9 months ago

Download wortschatz.uni-leipzig.de

Abstract. We present an approach for knowledge-free and unsupervised recognition of compound nouns for languages that use one-wordcompounds such as Germanic and Scandinavian languages. Our approach works by creating a candidate list of compound splits based on the word list of a large corpus. Then, we filter this list using the following criteria: (a) frequencies of compounds and parts, (b) length of parts. In a second step, we search the corpus for periphrases, that is a reformulation of the (single-word) compound using the parts and very high frequency words (which are usually prepositions or determiners). This step excludes spurious candidate splits at cost of recall. To increase recall again, we train a trie-based classifier that also allows splitting multipart-compounds iteratively. We evaluate our method for both steps and with various parameter settings for German against a manually created gold standard, showing promising results above 80% precision for the splits and about hal...

Florian Holz, Chris Biemann

Real-time Traffic

CICLING 2008 | Compound Nouns | Compound Splits | Natural Language Processing | Spurious Candidate Splits |

claim paper

Post Info
More Details (n/a)

Added	12 Oct 2010
Updated	12 Oct 2010
Type	Conference
Year	2008
Where	CICLING
Authors	Florian Holz, Chris Biemann

Comments (0)

Sciweavers

Unsupervised and Knowledge-Free Learning of Compound Splits and Periphrases

CICLING 2008 | Compound Nouns | Compound Splits | Natural Language Processing | Spurious Candidate Splits |

Explore & Download

Productivity Tools

Sciweavers