Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, for highly distributed envir...
Information filtering systems constitute a critical component in modern information seeking applications. As the number of users grows and the information available becomes even bi...
Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summariza...
We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequ...
Fernando C. N. Pereira, Naftali Tishby, Lillian Le...
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact ...