Sciweavers

502 search results - page 16 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
WWW
2006
ACM
14 years 9 months ago
Robust web content extraction
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
COMAD
2009
13 years 9 months ago
Querying for relations from the semi-structured Web
We present a class of web queries whose result is a multi-column relation instead of a collection of unstructured documents as in standard web search. The user specifies the query...
Sunita Sarawagi
JCDL
2006
ACM
237views Education» more  JCDL 2006»
14 years 2 months ago
Automatic extraction of table metadata from digital documents
Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and high...
Ying Liu, Prasenjit Mitra, C. Lee Giles, Kun Bai
ICDAR
2003
IEEE
14 years 1 months ago
A Constraint-based Approach to Table Structure Derivation
er presents an approach to deriving an abstract geometric model of a table from a physical representation. The technique developed uses a graph of constraints between cells which ...
Matthew Hurst
KDD
2002
ACM
148views Data Mining» more  KDD 2002»
14 years 8 months ago
Discovering informative content blocks from Web documents
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
Shian-Hua Lin, Jan-Ming Ho