In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract ...
We report on the design and implementation of a system which automates the process of capturing structured documents from the optically recognized form of printed materials. The sy...
The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned wi...
Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Cov...
Several scalable media codecs have been standardized in recent years to cope with heterogeneous usage conditions and to aim at always providing audio, video and image content in t...
An innovative algorithm for automatic generation of Huffman coding tables for semantic classes of digital images is presented. Collecting statistics over a large dataset of corresp...