Applying Monte Carlo Techniques to Language Identification

14 years 3 months ago

Download www.xs4all.nl

Two major stages stages in language identification systems can be identified: the language modeling stage, where the distinctive features of languages are determined and stored in models, and the classification stage, in which the model of the (partial) input document is compared to the reference language models. The language model most similar to the input document represents the language of the document. We describe the best-known modeling and classification techniques known in literature, and identify one disadvantage in them: the need to create a model of the entire document, even though the language can be identified with a small number of features. To avoid this, we introduce a new language identification technique that is based on Monte Carlo sampling. We show that, by determining the language of a large enough number of random features, we can determine the document language to be the language which result most often from these features. Whether the amount of samples is suffic...

Arjen Poutsma

Real-time Traffic

CLIN 2001 | CLIN 2004 | Input Document | Language Identification | Language Model |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	CLIN
Authors	Arjen Poutsma

Comments (0)

Sciweavers

Applying Monte Carlo Techniques to Language Identification

CLIN 2001 | CLIN 2004 | Input Document | Language Identification | Language Model |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers