

Harvesting Multi-Word Expressions from Parallel Corpora

14 years 3 months ago
Harvesting Multi-Word Expressions from Parallel Corpora
The paper presents a set of approaches to extend the automatically created Slovene wordnet with nominal multiword expressions. In the first approach multiword expressions from Princeton WordNet are translated with a technique that is based on wordalignment and lexicosyntactic patterns. This is followed by extracting new terms from a monolingual corpus using keywordness ranking and contextual patterns. Finally, the multiword expressions are assigned a hypernym and added to our wordnet. Manual evaluation and comparison of the results shows that the translation approach is the most straightforward and accurate. However, it is successfully complemented by the two monolingual approaches which are able to identify more term candidates in the corpus that would otherwise go unnoticed. Some weaknesses of the proposed wordnet extension techniques are also addressed.
Spela Vintar, Darja Fiser
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Spela Vintar, Darja Fiser
Comments (0)