The proliferation of video content on the web makes similarity detection an indispensable tool in web data management, searching, and navigation. We have previously proposed a compact representation of video clips, called video signature, for retrieving similar video clips in large databases. In this paper, we propose a new signature clustering algorithm to further improve retrieval per. The algorithm treats all the signatures as an abstract threshold graph, where the threshold is determined based on local data statistics. Similar clusters are identified as highly connected regions in the graph. This algorithm outperforms simple thresholding and hierarchical clustering techniques in identifying a set of manually-determined similar clusters from a dataset of 46,356 web video clips. At 95% precision, our algorithm attains 85% recall while simple thresholding and complete-link hierarchical scheme attain 67% and 75% recall respectively. Applying our algorithm to the entire dataset, 6,900 ...
Sen-Ching S. Cheung, Avideh Zakhor