Disentangling from Babylonian Confusion - Unsupervised Language Identification

16 years 1 months ago

Download wortschatz.uni-leipzig.de

: This work presents an unsupervised solution to language identification. The method sorts multilingual text corpora on the basis of sentences into the different languages that are contained and makes no assumptions on the number or size of the monolingual fractions. Evaluation on 7-lingual corpora and bilingual corpora show that the quality of classification is comparable to supervised approaches and works almost error-free from 100 sentences per language on.

Christian Biemann, Sven Teresniak

Real-time Traffic

7-lingual Corpora | Bilingual Corpora | CICLING 2005 | Multilingual Text Corpora | Natural Language Processing |

claim paper

Post Info
More Details (n/a)

Added	26 Jun 2010
Updated	26 Jun 2010
Type	Conference
Year	2005
Where	CICLING
Authors	Christian Biemann, Sven Teresniak

Comments (0)

Sciweavers

Disentangling from Babylonian Confusion - Unsupervised Language Identification

7-lingual Corpora | Bilingual Corpora | CICLING 2005 | Multilingual Text Corpora | Natural Language Processing |

Explore & Download

Productivity Tools

Sciweavers