Sciweavers

57 search results - page 6 / 12
» Expected Utility of Content Blocks in Web Content Extraction
Sort
View
WIDM
2003
ACM
14 years 2 months ago
Datarover: a taxonomy based crawler for automated data extraction from data-intensive websites
The advent of e-commerce has created a trend that brought thousands of catalogs online. Most of these websites are “taxonomy-directed”. A Web site is said to be ``taxonomydire...
Hasan Davulcu, S. Koduri, Saravanakumar Nagarajan
HT
2006
ACM
14 years 2 months ago
Journey to the past: proposal of a framework for past web browser
While the Internet community recognized early on the need to store and preserve past content of the Web for future use, the tools developed so far for retrieving information from ...
Adam Jatowt, Yukiko Kawai, Satoshi Nakamura, Yutak...
CHI
1996
ACM
14 years 1 months ago
Silk from a Sow's Ear: Extracting Usable Structures from the Web
In its current implementation, the World-Wide Web lacks much of the explicit structure and strong typing found in many closed hypertext systems. While this property has directly f...
Peter Pirolli, James E. Pitkow, Ramana Rao
WSDM
2009
ACM
117views Data Mining» more  WSDM 2009»
14 years 3 months ago
Query by document
We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of conten...
Yin Yang, Nilesh Bansal, Wisam Dakka, Panagiotis G...
SAMT
2007
Springer
108views Multimedia» more  SAMT 2007»
14 years 3 months ago
Document Layout Substructure Discovery
Abstract. In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, ext...
Claudio Andreatta