Sciweavers

502 search results - page 18 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
WWW
2008
ACM
14 years 9 months ago
As we may perceive: finding the boundaries of compound documents on the web
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Pavel Dmitriev
ICDAR
2003
IEEE
14 years 1 months ago
Automated Detection and Segmentation of Table of Contents Page from Document Images
With an aim to extract the structural information from the table of contents (TOC) to help develop digital document library the requirement of identifying/segmenting the TOC page ...
S. Mandal, S. P. Chowdhury, Amit Kumar Das, Bhabat...
ASP
2005
Springer
13 years 10 months ago
Exploiting ASP for Semantic Information Extraction
Abstract. The paper describes HıLεX, a new ASP-based system for the extraction of information from unstructured documents. Unlike previous systems, which are mainly syntactic, HÄ...
Massimo Ruffolo, Nicola Leone, Marco Manna, Domeni...
ESWS
2007
Springer
14 years 2 months ago
A Unified Approach to Retrieving Web Documents and Semantic Web Data
The Semantic Web seems to be evolving into a property-linked web of RDF data, conceptually divorced from (but physically housed in) the hyperlinked web of HTML documents. We discus...
Trivikram Immaneni, Krishnaprasad Thirunarayan
WWW
2005
ACM
14 years 9 months ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger