Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of ...
Document recognition involves many kinds of hypotheses: segmentation hypotheses, classification hypotheses, spatial relationship hypotheses, and so on. Many recognition strategie...
Richard Zanibbi, Dorothea Blostein, James R. Cordy
This paper presents our work on automatically locating charts from document pages, which is an important stage in the chart image recognition and understanding system being develo...
While there are many proposals for path indexes on XML documents, none of them is perfectly suited for indexing large-scale collections of interlinked XML documents. Existing strat...
We compare different strategies to apply statistical machine translation techniques in order to retrieve documents which are a plausible translation of a given source document. Fi...