Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document core...
Jian Huang 0002, Sarah M. Taylor, Jonathan L. Smit...
High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to...
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact ...
Say you are looking for information about a particular person. A search engine returns many pages for that person's name but which pages are about the person you care about, ...
: WordNet (WN) is a lexical knowledge base, first developed for English and then adopted for several Western European languages, which was created as a machinereadable dictionary b...