Sciweavers

543 search results - page 36 / 109
» Exploiting content redundancy for web information extraction
Sort
View
WIDM
2003
ACM
14 years 1 months ago
Datarover: a taxonomy based crawler for automated data extraction from data-intensive websites
The advent of e-commerce has created a trend that brought thousands of catalogs online. Most of these websites are “taxonomy-directed”. A Web site is said to be ``taxonomydire...
Hasan Davulcu, S. Koduri, Saravanakumar Nagarajan
USENIX
2007
13 years 10 months ago
Supporting Practical Content-Addressable Caching with CZIP Compression
Content-based naming (CBN) enables content sharing across similar files by breaking files into positionindependent chunks and naming these chunks using hashes of their contents....
KyoungSoo Park, Sunghwan Ihm, Mic Bowman, Vivek S....
WWW
2009
ACM
14 years 8 months ago
Incorporating site-level knowledge to extract structured data from web forums
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
HT
2009
ACM
13 years 5 months ago
Retrieving broken web links using an approach based on contextual information
In this short note we present a recommendation system for automatic retrieval of broken Web links using an approach based on contextual information. We extract information from th...
Juan Martinez-Romo, Lourdes Araujo
LREC
2010
170views Education» more  LREC 2010»
13 years 9 months ago
Construction of Text Summarization Corpus for the Credibility of Information on the Web
Recently, the credibility of information on the Web has become an important issue. In addition to telling about content of source documents, indicating how to interpret the conten...
Masahiro Nakano, Hideyuki Shibuki, Rintaro Miyazak...