Search Sciweavers | Sciweavers

543 search results - page 41 / 109

» Exploiting content redundancy for web information extraction

146

click to vote

WWW
2008
ACM

181views Internet Technology» more WWW 2008»

Improving web spam detection with re-extracted features

16 years 5 months ago

Download www2008.org

Web spam detection has become one of the top challenges for the Internet search industry. Instead of using some heuristic rules, we propose a feature re-extraction strategy to opt...

Guanggang Geng, Chunheng Wang, Qiudan Li

claim paper

Read More »

181

click to vote

COOPIS
1997
IEEE

140views Information Technology» more COOPIS 1997»

Semi-Automatic Wrapper Generation for Internet Information Sources

15 years 8 months ago

Download www.isi.edu

To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), we are building tools to build informatio...

Naveen Ashish, Craig A. Knoblock

claim paper

Read More »

148

click to vote

WEBI
2010
Springer

128views Internet Technology» more WEBI 2010»

Reducing the Cold-Start Problem in Content Recommendation through Opinion Classification

15 years 2 months ago

Download www.univ-orleans.fr

Like search engines, recommender systems have become a tool that cannot be ignored by websites with a large selection of products, music, news or simply webpages links. The perform...

Damien Poirier, Françoise Fessant, Isabelle...

claim paper

Read More »

131

click to vote

EMNLP
2010

134views Natural Language Processing» more EMNLP 2010»

Incorporating Content Structure into Text Analysis Applications

15 years 2 months ago

Download people.csail.mit.edu

In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the lingu...

Christina Sauper, Aria Haghighi, Regina Barzilay

claim paper

Read More »

171

click to vote

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

15 years 4 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

« Prev « First page 41 / 109 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers