Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler

15 years 9 months ago

Download www.lrec-conf.org

Resource-poor languages may suffer from a lack of any of the basic resources that are fundamental to computational linguistics, including an adequate digital lexicon. Given the relatively small corpus of texts that exists for such languages, extending the lexicon presents a challenge. Languages with complex morphology present a special case, however, because individual words in these languages provide a great deal of information about the grammatical properties of the roots that they are based on. Given a morphological analyzer, it is even possible to extract novel roots from words. In this paper, we look at the case of Tigrinya, a Semitic language with limited lexical resources for which a morphological analyzer is available. It is shown that this analyzer applied to the list of more than 200,000 Tigrinya words that is extracted by a web crawler can extend the lexicon in two ways, by adding new roots and by inferring some of the derivational constraints that apply to known roots.

Michael Gasser

Real-time Traffic

Adequate Digital Lexicon | Education | Limited Lexical Resources | LREC 2010 | Morphological Analyzer |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Michael Gasser

Sciweavers

Expanding the Lexicon for a Resource-Poor Language Using a Morphological Analyzer and a Web Crawler

Adequate Digital Lexicon | Education | Limited Lexical Resources | LREC 2010 | Morphological Analyzer |

Explore & Download

Productivity Tools

Sciweavers