Content identification has many applications, ranging from preventing illegal sharing of copyrighted content on video sharing websites, to automatic identification and tagging of content. Several content identification techniques based on watermarking or robust hashes have been proposed in the literature, but they have mostly been evaluated through experiments. This paper analyzes binary hash-based content identification schemes under a decision theoretic framework and presents a lower bound on the length of the hash required to correctly identify multimedia content that may have undergone modifications. A practical scheme for content identification is evaluated under the proposed framework. The results obtained through experiments agree very well with the performance suggested by the theoretical analysis. Categories and Subject Descriptors K.6.5 [Management of Computing and Information Systems]: Security and Protection General Terms Security Keywords Content identification, content f...
Avinash L. Varna, Ashwin Swaminathan, Min Wu