Search Sciweavers | Sciweavers

2677 search results - page 11 / 536

» Extracting Structured Data from Web Pages

186

click to vote

SIGIR
2005
ACM

156views Information Technology» more SIGIR 2005»

Title extraction from bodies of HTML documents and its application to web page retrieval

16 years 7 days ago

Download research.microsoft.com

This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...

Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...

claim paper

Read More »

148

Voted

ER
2007
Springer

142views Database» more ER 2007»

Automatic Hidden-Web Table Interpretation by Sibling Page Comparison

16 years 25 days ago

Download www.deg.byu.edu

The longstanding problem of automatic table interpretation still illudes us. Its solution would not only be an aid to table processing applications such as large volume table conve...

Cui Tao, David W. Embley

claim paper

Read More »

159

Voted

CIKM
2003
Springer

129views Information Technology» more CIKM 2003»

Extracting unstructured data from template generated web documents

15 years 12 months ago

Download www.ir.iit.edu

We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...

Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...

claim paper

Read More »

193

Voted

WEBDB
1999
Springer

196views Database» more WEBDB 1999»

Web Ecology: Recycling HTML Pages as XML Documents Using W4F

15 years 11 months ago

Download db.cis.upenn.edu

In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...

Arnaud Sahuguet, Fabien Azavant

claim paper

Read More »

164

click to vote

SAINT
2003
IEEE

115views Internet Technology» more SAINT 2003»

Extracting Spatial Knowledge from the Web

15 years 12 months ago

Download mccurley.org

The content of the world-wide web is pervaded by information of a geographical or spatial nature, particularly such location information as addresses, postal codes, and telephone ...

Yasuhiko Morimoto, Masaki Aono, Michael E. Houle, ...

claim paper

Read More »

« Prev « First page 11 / 536 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers