Much of recent action recognition research is based on
space-time interest points extracted from video using a Bag
of Words (BOW) representation. It mainly relies on the discriminative
power of individual local space-time descriptors,
whilst ignoring potentially valuable information about
the global spatio-temporal distribution of interest points.
In this paper, we propose a novel action recognition approach
which differs significantly from previous interest
points based approaches in that only the global spatiotemporal
distribution of the interest points are exploited.
This is achieved through extracting holistic features from
clouds of interest points accumulated over multiple temporal
scales followed by automatic feature selection. Our approach
avoids the non-trivial problems of selecting the optimal
space-time descriptor, clustering algorithm for constructing
a codebook, and selecting codebook size faced by
previous interest points based methods. Our model is able
...