Sciweavers

368 search results - page 5 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
ICDAR
1997
IEEE
13 years 11 months ago
Representing OCRed documents in HTML
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Tao Hong, Sargur N. Srihari
WOA
2001
13 years 8 months ago
Object Oriented Mapping for HTML Documents
Emerging distributed technologies aim to provide simple and powerful tools for web services design and implementation. Main vendors provide modern frameworks so that a good coordi...
Francesco Garelli, Carlo Ferrari
ICTAI
1999
IEEE
13 years 11 months ago
A New Study on Using HTML Structures to Improve Retrieval
Locating useful information effectively from the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks ...
Michal Cutler, H. Deng, S. Maniccam, Weiyi Meng
DMKD
2003
ACM
114views Data Mining» more  DMKD 2003»
14 years 18 days ago
Deriving link-context from HTML tag tree
HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks...
Gautam Pant
IJCAI
2003
13 years 8 months ago
Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference
Information extraction (IE) aims at extracting specific information from a collection of documents. A lot of previous work on 10 from semi-structured documents (in XML or HTML) us...
Raymond Kosala, Maurice Bruynooghe, Jan Van den Bu...