Search Sciweavers | Sciweavers

48 search results - page 4 / 10

» Collection statistics for fast duplicate document detection

170

click to vote

KDD
2004
ACM

195views Data Mining» more KDD 2004»

Improved robustness of signature-based near-replica detection via lexicon randomization

16 years 7 months ago

Download ir.iit.edu

Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...

Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...

claim paper

Read More »

169

click to vote

DEXAW
1999
IEEE

91views Database» more DEXAW 1999»

Document Analysis Techniques for the Infinite Memory Multifunction Machine

15 years 11 months ago

Download www.crc.ricoh.com

A system that saves a digital copy of every document that users copy, print, or fax, without asking the user, has recently been proposed. Referred to as the Infinite Memory Multif...

Jonathan J. Hull, Dar-Shyang Lee, John F. Cullen, ...

claim paper

Read More »

189

click to vote

SIGIR
2006
ACM

84views Information Technology» more SIGIR 2006»

Near-duplicate detection by instance-level constrained clustering

16 years 19 days ago

Download www.cs.cmu.edu

For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...

Hui Yang, James P. Callan

claim paper

Read More »

191

click to vote

DAS
2008
Springer

126views Document Analysis» more DAS 2008»

A Fast Preprocessing Method for Table Boundary Detection: Narrowing Down the Sparse Lines Using Solely Coordinate Information

15 years 8 months ago

Download chemxseer.ist.psu.edu

As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...

Ying Liu, Prasenjit Mitra, C. Lee Giles

claim paper

Read More »

156

Voted

DGO
2006

134views Education» more DGO 2006»

Next steps in near-duplicate detection for eRulemaking

15 years 8 months ago

Download www.cs.cmu.edu

Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...

Hui Yang, Jamie Callan, Stuart W. Shulman

claim paper

Read More »

« Prev « First page 4 / 10 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers