Building a Better Similarity Trap with Statistically Improbable Features

16 years 1 months ago

Download csdl2.computer.org

One of the persistent topics in digital forensic research in recent years has been the problem of finding all things similar. Developed tools usually take on the form of similarity, or fuzzy hash. In this paper, we present a generic empirical study of the problem of finding common features in binary data. Specifically, we study the problem of false positives and demonstrate that similarity tools work only as well as the underlying data allows them to and, therefore, must be aware of the basic properties of the input. We propose a new feature selection algorithm, which is based on the notion of statistically improbable features. We also show that the proposed method, can be tuned to account for the type-specific distribution of false positives.

Vassil Roussev

Real-time Traffic

Biometrics | Digital Forensic Research | False Positives | Generic Empirical Study | HICSS 2009 | System Sciences |

claim paper

» Unsupervised modeling of object categories using link analysis techniques

Post Info
More Details (n/a)

Added	19 May 2010
Updated	19 May 2010
Type	Conference
Year	2009
Where	HICSS
Authors	Vassil Roussev

Comments (0)

Sciweavers

Building a Better Similarity Trap with Statistically Improbable Features

Biometrics | Digital Forensic Research | False Positives | Generic Empirical Study | HICSS 2009 | System Sciences |

Explore & Download

Productivity Tools

Sciweavers