Sciweavers

180 search results - page 5 / 36
» A Method for Calculating Term Similarity on Large Document C...
Sort
View
DRR
2008
13 years 9 months ago
Hybrid approach combining contextual and statistical information for identifying MEDLINE citation terms
There is a strong demand for developing automated tools for extracting pertinent information from the biomedical literature that is a rich, complex, and dramatically growing resou...
In-Cheol Kim, Daniel X. Le, George R. Thoma
ICCS
2009
Springer
14 years 2 months ago
Frequent Itemset Mining for Clustering Near Duplicate Web Documents
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use ...
Dmitry I. Ignatov, Sergei O. Kuznetsov
CIKM
2007
Springer
14 years 1 months ago
Semiautomatic evaluation of retrieval systems using document similarities
Taking advantage of the well-known cluster hypothesis that “closely associated documents tend to be relevant to the same request”, we can use inter-document similarity to prov...
Ben Carterette, James Allan
SIGIR
2008
ACM
13 years 7 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
SAC
2011
ACM
12 years 10 months ago
Biomedical concept extraction based on combining the content-based and word order similarities
It is well known that the main objective of conceptual retrieval models is to go beyond simple term matching by relaxing term independence assumption through concept recognition. ...
Duy Dinh, Lynda Tamine