Sciweavers

1109 search results - page 68 / 222
» Crawling on web graphs
Sort
View
SIGIR
2008
ACM
15 years 4 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
ICIP
2000
IEEE
16 years 5 months ago
Efficient Video Similarity Measurement and Search
We consider the use of meta-data and/or video-domain methods to detect similar videos on the web. Meta-data is extracted from the textual and hyperlink information associated with...
Sen-Ching S. Cheung, Avideh Zakhor
WWW
2004
ACM
16 years 5 months ago
Combining link and content analysis to estimate semantic similarity
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...
Filippo Menczer
USS
2008
15 years 6 months ago
There Is No Free Phish: An Analysis of "Free" and Live Phishing Kits
Phishing is a form of identity theft in which an attacker attempts to elicit confidential information from unsuspecting victims. While in the past there has been significant work ...
Marco Cova, Christopher Kruegel, Giovanni Vigna
ECIR
2006
Springer
15 years 5 months ago
Automatic Document Organization in a P2P Environment
Abstract. This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We co...
Stefan Siersdorfer, Sergej Sizov