Sciweavers

52 search results - page 3 / 11
» Representing OCRed documents in HTML
Sort
View
DEXA
2005
Springer
109views Database» more  DEXA 2005»
14 years 1 months ago
An XML Approach to Semantically Extract Data from HTML Tables
Abstract. Data intensive information is often published on the internet in the format of HTML tables. Extracting some of the information that is of users’ interest from the inter...
Jixue Liu, Zhuoyun Ao, Ho-Hyun Park, Yongfeng Chen
JMM2
2007
100views more  JMM2 2007»
13 years 7 months ago
On Separation of English Numerals from Multilingual Document Images
— For Optical Character Recognition (OCR) of bilingual or multilingual document containing text words in regional language and numerals in English, it is necessary to identify di...
Basanna V. Dhandra, Mallikarjun Hangarge
ICDAR
2003
IEEE
14 years 25 days ago
Indexing and retrieval of words in old documents
This paper describes a system for efficient indexing and retrieval of words in collections of document images. The proposed method is based on two main principles: unsupervised pr...
Simone Marinai, Emanuele Marino, Giovanni Soda
RULEML
2004
Springer
14 years 27 days ago
Rule Learning for Feature Values Extraction from HTML Product Information Sheets
The Web is now a huge information repository with a rich semantic structure that, however, is primarily addressed to human understanding rather than automated processing by a compu...
Costin Badica, Amelia Badica
APCCM
2009
13 years 8 months ago
Extracting and Modeling the Semantic Information Content of Web Documents to Support Semantic Document Retrieval
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
Shahrul Azman Noah, Lailatulqadri Zakaria, Arifah ...