Sciweavers

502 search results - page 25 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
WWW
2005
ACM
14 years 9 months ago
Web data extraction based on partial tree alignment
This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items...
Yanhong Zhai, Bing Liu
DMKD
2000
ACM
110views Data Mining» more  DMKD 2000»
14 years 25 days ago
Combining Strategies for Extracting Relations from Text Collections
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use...
Eugene Agichtein, Eleazar Eskin, Luis Gravano
DAS
2006
Springer
13 years 10 months ago
XCDF: A Canonical and Structured Document Format
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods...
Jean-Luc Bloechle, Maurizio Rigamonti, Karim Hadja...
ADL
1997
Springer
125views Digital Library» more  ADL 1997»
14 years 19 days ago
Error Tolerant Document Structure Analysis
Successful applications of digital libraries require structured access to sources of information. This paper presents an approach to extract the logical structure of text document...
Bertin Klein, Peter Fankhauser
DOCENG
2004
ACM
14 years 1 months ago
The lifecycle of a digital historical document: structure and content
This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final re...
Apostolos Antonacopoulos, Dimosthenis Karatzas, He...