The difficulty with information retrieval for OCR documents lies in the fact that OCR documents comprise of a significant amount of erroneous words and unfortunately most informat...
Document structure analysis can be regarded as a syntactic analysis problem. The order and containment relations among the physical or logical components of a document page can be...
A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, , affiliation and others) from online biomedical journals to p...
The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting ...
Susan E. Hauser, Jonathan Schlaifer, Tehseen F. Sa...