Search Sciweavers | Sciweavers

218

Voted

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

15 years 6 months ago

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

207

click to vote

EJC
2009

144views Information Technology» more EJC 2009»

A New Partial Information Extraction Method for Personal Mashup Construction

15 years 4 months ago

Download tokuda-www.cs.titech.ac.jp

Nowadays more and more Web sites generate Web pages containing client-side scripts such as JavaScript and Flash instead of ordinary static HTML pages. These scripts create dynamic ...

Junxia Guo, Hao Han, Takehiro Tokuda

claim paper

Read More »

160

click to vote

ICWE
2009
Springer

151views Internet Technology» more ICWE 2009»

A Layout-Independent Web News Article Contents Extraction Method Based on Relevance Analysis

16 years 1 months ago

Download tokuda-www.cs.titech.ac.jp

Abstract. The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrapp...

Hao Han, Takehiro Tokuda

claim paper

Read More »

212

click to vote

SIGMOD
2010
ACM

232views Database» more SIGMOD 2010»

Optimizing content freshness of relations extracted from the web using keyword search

15 years 6 months ago

Download www2.hawaii.edu

An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data acc...

Mohan Yang, Haixun Wang, Lipyeow Lim, Min Wang

claim paper

Read More »

170

click to vote

APWEB
2010
Springer

168views Internet Technology» more APWEB 2010»

ECON: An Approach to Extract Content from Web News Page

15 years 4 months ago

Download pages.cs.wisc.edu

Abstract--This paper provides a simple but effective approach, named ECON, to fully-automatically extract content from Web news page. ECON uses a DOM tree to represent the Web news...

Yan Guo, Huifeng Tang, Linhai Song, Yu Wang 0009, ...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers