There is a strong demand for developing automated tools for extracting pertinent information from the biomedical literature that is a rich, complex, and dramatically growing resou...
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use ...
Taking advantage of the well-known cluster hypothesis that “closely associated documents tend to be relevant to the same request”, we can use inter-document similarity to prov...
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
It is well known that the main objective of conceptual retrieval models is to go beyond simple term matching by relaxing term independence assumption through concept recognition. ...