Search Sciweavers | Sciweavers

502 search results - page 18 / 101

» Extracting Partial Structures from HTML Documents

207

Voted

WWW
2008
ACM

163views Internet Technology» more WWW 2008»

As we may perceive: finding the boundaries of compound documents on the web

16 years 7 months ago

Download www2008.org

This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...

Pavel Dmitriev

claim paper

Read More »

162

click to vote

ICDAR
2003
IEEE

199views Document Analysis» more ICDAR 2003»

Automated Detection and Segmentation of Table of Contents Page from Document Images

15 years 12 months ago

Download www.cse.salford.ac.uk

With an aim to extract the structural information from the table of contents (TOC) to help develop digital document library the requirement of identifying/segmenting the TOC page ...

S. Mandal, S. P. Chowdhury, Amit Kumar Das, Bhabat...

claim paper

Read More »

187

click to vote

ASP
2005
Springer

288views Automated Reasoning» more ASP 2005»

Exploiting ASP for Semantic Information Extraction

15 years 8 months ago

Download ftp.informatik.rwth-aachen.de

Abstract. The paper describes HıLεX, a new ASP-based system for the extraction of information from unstructured documents. Unlike previous systems, which are mainly syntactic, H�...

Massimo Ruffolo, Nicola Leone, Marco Manna, Domeni...

claim paper

Read More »

155

click to vote

ESWS
2007
Springer

174views Internet Technology» more ESWS 2007»

A Unified Approach to Retrieving Web Documents and Semantic Web Data

16 years 26 days ago

Download www.cs.wright.edu

The Semantic Web seems to be evolving into a property-linked web of RDF data, conceptually divorced from (but physically housed in) the hyperlinked web of HTML documents. We discus...

Trivikram Immaneni, Krishnaprasad Thirunarayan

claim paper

Read More »

211

click to vote

WWW
2005
ACM

154views Internet Technology» more WWW 2005»

Thresher: automating the unwrapping of semantic content from the World Wide Web

16 years 7 months ago

Download www2005.org

We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...

Andrew Hogue, David R. Karger

claim paper

Read More »

« Prev « First page 18 / 101 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers