Sciweavers

CICLING
2005
Springer

Disentangling from Babylonian Confusion - Unsupervised Language Identification

14 years 4 months ago
Disentangling from Babylonian Confusion - Unsupervised Language Identification
: This work presents an unsupervised solution to language identification. The method sorts multilingual text corpora on the basis of sentences into the different languages that are contained and makes no assumptions on the number or size of the monolingual fractions. Evaluation on 7-lingual corpora and bilingual corpora show that the quality of classification is comparable to supervised approaches and works almost error-free from 100 sentences per language on.
Christian Biemann, Sven Teresniak
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where CICLING
Authors Christian Biemann, Sven Teresniak
Comments (0)