Sciweavers

ACL
2015

IWNLP: Inverse Wiktionary for Natural Language Processing

8 years 7 months ago
IWNLP: Inverse Wiktionary for Natural Language Processing
Nowadays, there are a lot of natural language processing pipelines that are based on training data created by a few experts. This paper examines how the proliferation of the internet and its collaborative application possibilities can be practically used for NLP. For that purpose, we examine how the German version of Wiktionary can be used for a lemmatization task. We introduce IWNLP, an opensource parser for Wiktionary, that reimplements several MediaWiki markup language templates for conjugated verbs and declined adjectives. The lemmatization task is evaluated on three German corpora on which we compare our results with existing software for lemmatization. With Wiktionary as a resource, we obtain a high accuracy for the lemmatization of nouns and can even improve on the results of existing software for the lemmatization of nouns.
Matthias Liebeck, Stefan Conrad 0001
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ACL
Authors Matthias Liebeck, Stefan Conrad 0001
Comments (0)