Large-Scale Discovery of Spatially Related Images

15 years 5 months ago

Download cmp.felk.cvut.cz

— We propose a randomized data mining method that ﬁnds clusters of spatially overlapping images. The core of the method relies on the min-Hash algorithm for fast detection of pairs of images with spatial overlap, the so-called cluster seeds. The seeds are then used as visual queries to obtain clusters which are formed as transitive closures of sets of partially overlapping images that include the seed. We show that the probability of ﬁnding a seed for an image cluster rapidly increases with the size of the cluster. The properties and performance of the algorithm are demonstrated on datasets with 104 , 105 , and 5 · 106 images. The speed of the method depends on the size of the database and on the number of clusters. The ﬁrst stage of seed generation is close to linear for databases sizes up to approximately 234 ≈ 1010 images. On a single 2.4GHz PC, the clustering process took only 24 minutes for a standard database of more than hundred thousand images, i.e. only 0.014 second...

Ondrej Chum, Jiri Matas

Real-time Traffic