Subjectivity is a pragmatic, sentence-level feature that has important implications for text processing applications such as information extraction and information retrieval. We s...
What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statist...
In recent years, statistical language models are being proposed as alternative to the vector space model. Viewing documents as language samples introduces the issue of defining a...
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fit...
We present the process that has been followed for the development of an application that makes use of several heterogeneous Spanish public datasets that are related to administrat...