The amount of information available in the MEDLINE database makes it very hard for a researcher to retrieve a reasonable amount of relevant documents using a simple query language ...
A degradation model that describes many image degradations produced by desktop scanning is used to study the edge noise that is present in bilevel document images. The standard de...
Craig McGillivary, Chris Hale, Elisa H. Barney Smi...
A web-portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. E...
We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle ...
The Arabic language is a highly flexional and morphologically very rich language. It presents serious challenges to the automatic classification of documents, one of which is deter...