In this paper, we propose an approach for identifying curatable articles from a large document set. This system considers three parts of an article (title ract, MeSH terms, and ca...
We investigate the lexical and syntactic flexibility of a class of idiomatic expressions. We develop measures that draw on such linguistic properties, and demonstrate that these s...
Probabilistic Latent Semantic Analysis (PLSA) models have been shown to provide a better model for capturing polysemy and synonymy than Latent Semantic Analysis (LSA). However, th...
Opinion mining is a recent subdiscipline of computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. To aid the extr...
Detection of discourse structure is crucial in many text-based applications. This paper presents an original framework for describing textual parallelism which allows us to genera...
Faced with the problem of annotation errors in part-of-speech (POS) annotated corpora, we develop a method for automatically correcting such errors. Building on top of a successfu...
To tackle the problem of presenting a large number of options in spoken dialogue systems, we identify compelling options based on a model of user preferences, and present tradeoff...
This paper deals with the problem of recognizing and extracting acronymdefinition pairs in Swedish medical texts. This project applies a rule-based method to solve the acronym rec...
Esfinge is a general domain Portuguese question answering system. It tries to take advantage of the great amount of information existent in the World Wide Web. Since Portuguese is...
In this paper, we present an automated, quantitative, knowledge-poor method to evaluate the randomness of a collection of documents (corpus), with respect to a number of biased pa...