Gathering a large collection of images has been made quite easy by social and image sharing websites, e.g. flickr.com. However, using such collections faces the problem that they contain a large number of duplicates and highly similar images. This work tackles the problem of how to automatically organize image collections into sets of similar images, called image families hereinafter. We thoroughly compare the performance of two approaches to measure image similarity: global descriptors vs. a set of local descriptors. We assess the performance of these approaches as the problem scales up to thousands of images and hundreds of families. We present our results on a new dataset of CD/DVD game covers.
Mohamed Aly, Peter Welinder, Mario E. Munich, Piet