Sciweavers

CEAS
2007
Springer

Hardening Fingerprinting by Context

14 years 5 months ago
Hardening Fingerprinting by Context
Near-duplicate detection is not only an important pre and post processing task in Information Retrieval but also an effective spam-detection technique. Among different approaches to near-replica detection methods based on document signatures are particularly attractive due to their scalability to massive document collections and their ability to handle high throughput rates. Their weakness lies in the potential brittleness of signatures to small changes in content, which makes them vulnerable to various types of noise. In the important spam-filtering application, this vulnerability can also be exploited by dedicated attackers aiming to maximally fragment signatures corresponding to the same email campaign. We focus on the I-Match algorithm and present a method of strengthening it by considering the usage context when deciding which portions of a document should affect signature generation. This substantially (almost 100-fold in some cases) increases the difficulty of dedicated att...
Aleksander Kolcz, Abdur Chowdhury
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CEAS
Authors Aleksander Kolcz, Abdur Chowdhury
Comments (0)