Sciweavers

CVPR
2009
IEEE

Robust unsupervised segmentation of degraded document images with topic models

14 years 2 months ago
Robust unsupervised segmentation of degraded document images with topic models
Segmentation of document images remains a challenging vision problem. Although document images have a structured layout, capturing enough of it for segmentation can be difficult. Most current methods combine text extraction and heuristics for segmentation, but text extraction is prone to failure and measuring accuracy remains a difficult challenge. Furthermore, when presented with significant degradation many common heuristic methods fall apart. In this paper, we propose a Bayesian generative model for document images which seeks to overcome some of these drawbacks. Our model automatically discovers different regions present in a document image in a completely unsupervised fashion. We attempt no text extraction, but rather use discrete patch-based codebook learning to make our probabilistic representation feasible. Each latent region topic is a distribution over these patch indices. We capture rough document layout with an MRF Potts model. We take an analysis-by-synthesis approach ...
Timothy J. Burns, Jason J. Corso
Added 04 Sep 2010
Updated 04 Sep 2010
Type Conference
Year 2009
Where CVPR
Authors Timothy J. Burns, Jason J. Corso
Comments (0)