There are many accurate methods for language identification of long text samples, but identification of very short strings still presents a challenge. This paper studies a languag...
Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also p...
Over the years, the field of Language Resources and Technology (LRT) has developed a tremendous amount of resources and tools. However, there is no ready-to-use map that researche...
Dieter Van Uytvanck, Claus Zinn, Daan Broeder, Pet...
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered...
We present experiments in automatic genre classification on web corpora, comparing a wide variety of features on several different genreannotated datasets (HGC, I-EN, KI-04, KRYS...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluat...
Availability of labeled language resources, such as annotated corpora and domain dependent labeled language resources is crucial for experiments in the field of Natural Language ...