We consider the use of meta-data and/or video-domain methods to detect similar videos on the web. Meta-data is extracted from the textual and hyperlink information associated with each video clip. In video domain, we apply an efficient similarity detection algorithm called video signature. The idea is to form a signature for each clip by selecting a small number of its frames that are most similar to a set of random seed images. We then apply a statistical pruning algorithm to allow fast detection on very large databases. Using a small ground-truth set, we achieve 90% recall and 95% precision using only 8% of the total number of operations required without pruning. For a database of around 46,000 video clips crawled from the web, video signature technique significantly outperforms meta-data in precision and recall. We show that even better performance can be achieved by combining them together. Based on our measurements, each video clip in our database has,
Sen-Ching S. Cheung, Avideh Zakhor