Search Sciweavers | Sciweavers

543 search results - page 36 / 109

» Exploiting content redundancy for web information extraction

142

click to vote

WIDM
2003
ACM

130views Internet Technology» more WIDM 2003»

Datarover: a taxonomy based crawler for automated data extraction from data-intensive websites

15 years 9 months ago

Download www.public.asu.edu

The advent of e-commerce has created a trend that brought thousands of catalogs online. Most of these websites are “taxonomy-directed”. A Web site is said to be ``taxonomydire...

Hasan Davulcu, S. Koduri, Saravanakumar Nagarajan

claim paper

Read More »

114

Voted

USENIX
2007

60views Operating System» more USENIX 2007»

Supporting Practical Content-Addressable Caching with CZIP Compression

15 years 7 months ago

Download nsg.cs.princeton.edu

Content-based naming (CBN) enables content sharing across similar ﬁles by breaking ﬁles into positionindependent chunks and naming these chunks using hashes of their contents....

KyoungSoo Park, Sunghwan Ihm, Mic Bowman, Vivek S....

claim paper

Read More »

170

Voted

WWW
2009
ACM

209views Internet Technology» more WWW 2009»

Incorporating site-level knowledge to extract structured data from web forums

16 years 5 months ago

Download www2009.eprints.org

Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...

Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...

claim paper

Read More »

157

click to vote

HT
2009
ACM

148views Internet Technology» more HT 2009»

Retrieving broken web links using an approach based on contextual information

15 years 2 months ago

Download nlp.uned.es

In this short note we present a recommendation system for automatic retrieval of broken Web links using an approach based on contextual information. We extract information from th...

Juan Martinez-Romo, Lourdes Araujo

claim paper

Read More »

121

click to vote

LREC
2010

170views Education» more LREC 2010»

Construction of Text Summarization Corpus for the Credibility of Information on the Web

15 years 6 months ago

Download www.lrec-conf.org

Recently, the credibility of information on the Web has become an important issue. In addition to telling about content of source documents, indicating how to interpret the conten...

Masahiro Nakano, Hideyuki Shibuki, Rintaro Miyazak...

claim paper

Read More »

« Prev « First page 36 / 109 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers