Sciweavers

103 search results - page 4 / 21
» Models and Algorithms for Duplicate Document Detection
Sort
View
SIGIR
2006
ACM
14 years 1 months ago
Near-duplicate detection by instance-level constrained clustering
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
Hui Yang, James P. Callan
ICDAR
2005
IEEE
14 years 29 days ago
A Model for Detecting and Merging Vertically Spanned Table Cells in Plain Text Documents
A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a significan...
Vanessa Long, Robert Dale, Steve Cassidy
ICDAR
2003
IEEE
14 years 19 days ago
A Line Drawings Degradation Model for Performance Characterization
Line detection algorithms constitute the basis for technical document analysis and recognition. The performance of these algorithms decreases as the quality of the documents degra...
Jian Zhai, Liu Wenyin, Dov Dori, Qing Li
CIVR
2007
Springer
273views Image Analysis» more  CIVR 2007»
14 years 1 months ago
Scalable near identical image and shot detection
This paper proposes and compares two novel schemes for near duplicate image and video-shot detection. The first approach is based on global hierarchical colour histograms, using ...
Ondrej Chum, James Philbin, Michael Isard, Andrew ...
ICDAR
2003
IEEE
14 years 19 days ago
A Model-based Line Detection Algorithm in Documents
In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the ...
Yefeng Zheng, Huiping Li, David S. Doermann