Bootstrapping Language Description: the case of Mpiemo (Bantu A, Central African Republic)

15 years 8 months ago

Download www.lrec-conf.org

Linguists have long been producing grammatical decriptions of yet undescribed languages. This is a time-consuming process, which has already adapted to improved technology for recording and storage. We present here a novel application of NLP techniques to bootstrap analysis of collected data and speed-up manual selection work. To be more precise, we argue that unsupervised induction of morphology and part-of-speech analysis from raw text data is mature enough to produce useful results. Experiments with Latent Semantic Analysis were less fruitful. We exemplify this on Mpiemo, a so-far essentially undescribed Bantu language of the Central African Republic, for which raw text data was available.

Harald Hammarström, Christina Thornell, Malin

Real-time Traffic

Education | LREC 2008 | Raw Text Data | Undescribed Bantu Language | Undescribed Languages |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Harald Hammarström, Christina Thornell, Malin Petzell, Torbjörn Westerlund

Comments (0)

Sciweavers

Bootstrapping Language Description: the case of Mpiemo (Bantu A, Central African Republic)

Education | LREC 2008 | Raw Text Data | Undescribed Bantu Language | Undescribed Languages |

Explore & Download

Productivity Tools

Sciweavers