For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a significan...
Line detection algorithms constitute the basis for technical document analysis and recognition. The performance of these algorithms decreases as the quality of the documents degra...
This paper proposes and compares two novel schemes for near duplicate image and video-shot detection. The first approach is based on global hierarchical colour histograms, using ...
Ondrej Chum, James Philbin, Michael Isard, Andrew ...
In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the ...