Sciweavers

543 search results - page 41 / 109
» Exploiting content redundancy for web information extraction
Sort
View
WWW
2008
ACM
14 years 8 months ago
Improving web spam detection with re-extracted features
Web spam detection has become one of the top challenges for the Internet search industry. Instead of using some heuristic rules, we propose a feature re-extraction strategy to opt...
Guanggang Geng, Chunheng Wang, Qiudan Li
COOPIS
1997
IEEE
13 years 12 months ago
Semi-Automatic Wrapper Generation for Internet Information Sources
To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), we are building tools to build informatio...
Naveen Ashish, Craig A. Knoblock
WEBI
2010
Springer
13 years 5 months ago
Reducing the Cold-Start Problem in Content Recommendation through Opinion Classification
Like search engines, recommender systems have become a tool that cannot be ignored by websites with a large selection of products, music, news or simply webpages links. The perform...
Damien Poirier, Françoise Fessant, Isabelle...
EMNLP
2010
13 years 5 months ago
Incorporating Content Structure into Text Analysis Applications
In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the lingu...
Christina Sauper, Aria Haghighi, Regina Barzilay
SIGIR
2008
ACM
13 years 7 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...