Search Sciweavers | Sciweavers

205

DEXA
2005
Springer

109views Database» more DEXA 2005»

An XML Approach to Semantically Extract Data from HTML Tables

16 years 5 days ago

Abstract. Data intensive information is often published on the internet in the format of HTML tables. Extracting some of the information that is of users’ interest from the inter...

Jixue Liu, Zhuoyun Ao, Ho-Hyun Park, Yongfeng Chen

claim paper

Read More »

186

click to vote

JMM2
2007

100views more JMM2 2007»

On Separation of English Numerals from Multilingual Document Images

15 years 6 months ago

Download www.academypublisher.com

— For Optical Character Recognition (OCR) of bilingual or multilingual document containing text words in regional language and numerals in English, it is necessary to identify di...

Basanna V. Dhandra, Mallikarjun Hangarge

claim paper

Read More »

161

click to vote

ICDAR
2003
IEEE

132views Document Analysis» more ICDAR 2003»

Indexing and retrieval of words in old documents

15 years 12 months ago

Download www.dsi.unifi.it

This paper describes a system for eﬃcient indexing and retrieval of words in collections of document images. The proposed method is based on two main principles: unsupervised pr...

Simone Marinai, Emanuele Marino, Giovanni Soda

claim paper

Read More »

209

click to vote

RULEML
2004
Springer

121views Internet Technology» more RULEML 2004»

Rule Learning for Feature Values Extraction from HTML Product Information Sheets

15 years 12 months ago

Download software.ucv.ro

The Web is now a huge information repository with a rich semantic structure that, however, is primarily addressed to human understanding rather than automated processing by a compu...

Costin Badica, Amelia Badica

claim paper

Read More »

232

Voted

APCCM
2009

165views Knowledge Management» more APCCM 2009»

Extracting and Modeling the Semantic Information Content of Web Documents to Support Semantic Document Retrieval

15 years 7 months ago

Download crpit.com

Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...

Shahrul Azman Noah, Lailatulqadri Zakaria, Arifah ...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers