The goal of document image analysis is to produce interpretations that match those of a uent and knowledgeable human when viewing the same input. Because computer vision technique...
Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document met...
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract ...
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...
Sentence Clustering is often used as a first step in Multi-Document Summarization (MDS) to find redundant information. All the same there is no gold standard available. This paper...