We present a document expansion approach that uses Conditional Random Field (CRF) segmentation to automatically extract salient phrases from ad titles. We then supplement the ad d...
In this paper, a high-speed document image classification algorithm is presented. The algorithm is based on the bottom-up strategy which can successfully segment and classify any ...
This paper presents an Italic/Roman word type recognition system without a priori knowledge on the characters' font. This method aims at analyzing old documents in which char...
We describe a segmentation method and associated file format for storing images of color documents. We separate each page of the document into three layers, containing the backgro...
Daniel P. Huttenlocher, Pedro F. Felzenszwalb, Wil...
Compound (or mixed) document images contain graphic or textual content along with pictures. They are a very common form of documents, found in magazines, brochures, web-sites etc....