Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
We describe a model for the lexical analysis of Arabic text, using the lists of alternatives supplied by a broad-coverage morphological analyzer, SAMA, which include stable lemma ...
Rushin Shah, Paramveer S. Dhillon, Mark Liberman, ...
Korean is an agglutinative language that does not have explicit word boundaries. It is also a highly inflective language that exhibits severe coarticulation effects. These charac...
Sakriani Sakti, Andrew M. Finch, Ryosuke Isotani, ...
This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed se...
This paper describes an approach to the treatment of nominal compounds in a machine translation project employing a modern unification-based system. General problems connected wit...
Pierrette Bouillon, Katharina Boesefeldt, Graham R...