Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method

14 years 6 months ago

Download www.icdar2011.org

—In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later reﬁnement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in nonLatin scripts. Keywords-Word Spotting, Heterogeneous Document Collections, Dense SIFT Features, Latent Semantic Indexing.

Marçal Rusiñol, David Aldavert, Rica

Real-time Traffic

Document Analysis | Document Collections | Feature Vectors | ICDAR 2011 | Image Collections |

claim paper

Added	24 Dec 2011
Updated	24 Dec 2011
Type	Journal
Year	2011
Where	ICDAR
Authors	Marçal Rusiñol, David Aldavert, Ricardo Toledo, Josep Lladós

Sciweavers

Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method

Document Analysis | Document Collections | Feature Vectors | ICDAR 2011 | Image Collections |

Explore & Download

Productivity Tools

Sciweavers