A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
The identification of near-duplicate keyframe (NDK) pairs is a useful task for a variety of applications such as news story threading and content-based video search. In this pape...
Traditionally, when one wants to learn about a particular topic, one reads a book or a survey paper. With the rapid expansion of the Web, learning in-depth knowledge about a topic...
Large search engines process thousands of queries per second on billions of pages, making query processing a major factor in their operating costs. This has led to a lot of resear...
Online service providers are engaged in constant conflict with miscreants who try to siphon a portion of legitimate traffic to make illicit profits. We study the abuse of “tr...
Tyler Moore, Nektarios Leontiadis, Nicolas Christi...