Sciweavers

48 search results - page 3 / 10
» Collection statistics for fast duplicate document detection
Sort
View
JCB
2007
106views more  JCB 2007»
13 years 7 months ago
Clustered Sequence Representation for Fast Homology Search
We present a novel approach to managing redundancy in sequence databanks such as GenBank. We store clusters of near-identical sequences as a representative union-sequence and a se...
Michael Cameron, Yaniv Bernstein, Hugh E. Williams
ICMCS
2007
IEEE
149views Multimedia» more  ICMCS 2007»
14 years 1 months ago
SICO: A System for Detection of Near-Duplicate Images During Search
Duplicate and near-duplicate digital image matching is beneficial for image search in terms of collection management, digital content protection, and search efficiency. In this ...
Jun Jie Foo, Ranjan Sinha, Justin Zobel
SIGIR
2010
ACM
13 years 2 months ago
Efficient partial-duplicate detection based on sequence matching
With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since parti...
Qi Zhang, Yue Zhang, Haomin Yu, Xuanjing Huang
ICAIL
2007
ACM
13 years 11 months ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Jack G. Conrad, Edward L. Raymond
INEX
2007
Springer
14 years 1 months ago
Phrase Detection in the Wikipedia
The Wikipedia XML collection turned out to be rich of marked-up phrases as we carried out our INEX 2007 experiments. Assuming that a phrase occurs at the inline level of the markup...
Miro Lehtonen, Antoine Doucet