We report on the design and implementation of a system which automates the process of capturing structured documents from the optically recognized form of printed materials. The sy...
Semantic analysis of a document collection can be viewed as an unsupervised clustering of the constituent words and documents around hidden or latent concepts. This has shown to i...
An algorithm is presented that automatically matches images of presentation slides to the symbolic source file (e.g., PowerPointTM or AcrobatTM ) from which they were generated. T...
Graphics detection and recognition are fundamental research problems in document image analysis and retrieval. As one of the most pervasive graphical elements in business and gove...
Multimedia data collections immersed into social networks may be explored from the point of view of varying documents and users characteristics. In this paper, we develop a unifi...