Idioms and other figuratively used expressions pose considerable problems to natural language processing applications because they are very frequent and often behave idiosyncratic...
Caroline Sporleder, Linlin Li, Philip Gorinski, Xa...
We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same nume...
In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Succe...
We present the GIVE-2 Corpus, a new corpus of human instruction giving. The corpus was collected by asking one person in each pair of subjects to guide the other person towards co...
Andrew Gargett, Konstantina Garoufi, Alexander Kol...
This presentation and accompanying demonstration focuses on the development of a mobile platform for e-learning purposes with enhanced text-to-speech capabilities. It reports on a...
Automatic language recognition on spontaneous speech has experienced a rapid development in the last few years. This development has been in part due to the competitive technologi...
The paper describes an approach to expedite the process of manual annotation of a Hindi dependency treebank which is currently under development. We propose a way by which consist...
Previous content extraction evaluations have neglected to address problems which complicate the incorporation of extracted information into an existing knowledge base. Previous qu...
Paul McNamee, Hoa Trang Dang, Heather Simpson, Pat...
A speech database, named KALAKA, was created to support the Albayzin 2008 Evaluation of Language Recognition Systems, organized by the Spanish Network on Speech Technologies from ...