This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual Spanish (NCCSp). The corpus contains around ...
Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogg...
This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed se...
Recently, collaboratively constructed resources such as Wikipedia and Wiktionary have been discovered as valuable lexical semantic knowledge bases with a high potential in diverse...
Building a Linguistic Resource (LR) is a task requiring a huge quantitative of means, human resources and funds. Though finalization of the development phase and assessment of the...