Sciweavers

52 search results - page 5 / 11
» Representing OCRed documents in HTML
Sort
View
DOCENG
2004
ACM
14 years 1 months ago
Supervised learning for the legacy document conversion
We consider the problem of document conversion from the renderingoriented HTML markup into a semantic-oriented XML annotation defined by user-specific DTDs or XML Schema descrip...
Boris Chidlovskii, Jérôme Fuselier
ICDAR
2009
IEEE
14 years 2 months ago
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques
There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...
Tamir Hassan
ICDAR
2011
IEEE
12 years 7 months ago
Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach
—Libraries in South Asia hold huge collections of valuable printed documents in Urdu and it is of interest to digitize these collections to make them more accessible. The unavail...
Ali Abidi, Imran Siddiqi, Khurram Khurshid
DOCENG
2009
ACM
14 years 2 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
HICSS
1999
IEEE
125views Biometrics» more  HICSS 1999»
13 years 12 months ago
ASHRAM: Active Summarization and Markup
Typically, searching for information in a document collection amounts to refining a query and then scanning a large number of documents to determine their relevance. Active Summar...
Mary S. Neff, James W. Cooper