The production of rich multilingual speech corpus resources on a large scale is a requirement for many linguistic, phonetic and technological tasks, in both research and applicati...
The relevance of syntactic dependency annotated corpora is nowadays unquestioned. However, a broad debate on the optimal set of dependency relation tags did not take place yet. As...
Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled doc...
This paper focuses on the use of advanced techniques of text analysis as support for collocation extraction. A hybrid system is presented that combines statistical methods and mul...
This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with l...
Samuel Reese, Gemma Boleda, Montse Cuadros, Llu&ia...