Sciweavers

468 search results - page 39 / 94
» Automatic Data Extraction from Data-Rich Web Pages
Sort
View
WWW
2010
ACM
13 years 8 months ago
Exploiting content redundancy for web information extraction
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
ICML
2005
IEEE
14 years 8 months ago
2D Conditional Random Fields for Web information extraction
The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web infor...
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Y...
WWW
2001
ACM
14 years 8 months ago
Crawling the Hidden Web
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
Sriram Raghavan, Hector Garcia-Molina
GFKL
2007
Springer
152views Data Mining» more  GFKL 2007»
14 years 1 months ago
Supporting Web-based Address Extraction with Unsupervised Tagging
Abstract. The manual acquisition and modeling of tourist information as e.g. addresses of points of interest is time and, therefore, cost intensive. Furthermore, the encoded inform...
Berenike Loos, Chris Biemann
CORR
2004
Springer
79views Education» more  CORR 2004»
13 years 7 months ago
Summarizing Encyclopedic Term Descriptions on the Web
We are developing an automatic method to compile an encyclopedic corpus from the Web. In our previous work, paragraph-style descriptions for a term are extracted from Web pages an...
Atsushi Fujii, Tetsuya Ishikawa