Abstract. The Scale Invariant Feature Transform (SIFT) is an algorithm used to detect and describe scale-, translation- and rotation-invariant local features in images. The origina...
Common visual codebook generation methods used in
a Bag of Visual words model, e.g. k-means or Gaussian
Mixture Model, use the Euclidean distance to cluster features
into visual...
Abstract. The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for ob...
Lamberto Ballan, Marco Bertini, Alberto Del Bimbo,...
We describe a method to align ASL video subtitles with a closed-caption transcript. Our alignments are partial, based on spotting words within the video sequence, which consists o...
We consider the `group motion segmentation' problem and provide a solution for it. The group motion segmentation problem aims at analyzing motion trajectories of multiple obj...