Sciweavers

489 search results - page 4 / 98
» Effective techniques for automatic extraction of Web publica...
Sort
View
ICDAR
2003
IEEE
14 years 1 months ago
Identifying Story and Preview Images in News Web Pages
The World Wide Web provides an increasingly powerful and popular publication mechanism. Web documents often contain a large number of images serving various different purposes. Th...
Jianying Hu, Amit Bagga
AAAI
2008
13 years 10 months ago
Automatic Extraction of Data Points and Text Blocks from 2-Dimensional Plots in Digital Documents
Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can ...
Saurabh Kataria, William Browuer, Prasenjit Mitra,...
DOCENG
2009
ACM
14 years 2 months ago
Web document text and images extraction using DOM analysis and natural language processing
: © Web Document Text and Images Extraction using DOM Analysis and Natural Language Processing Parag Mulendra Joshi, Sam Liu HP Laboratories HPL-2009-187 Web page text extraction,...
Parag Mulendra Joshi, Sam Liu
VLDB
2001
ACM
144views Database» more  VLDB 2001»
14 years 3 days ago
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extracti...
Valter Crescenzi, Giansalvatore Mecca, Paolo Meria...
VLDB
2002
ACM
161views Database» more  VLDB 2002»
13 years 7 months ago
Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many s...
Panagiotis G. Ipeirotis, Luis Gravano