We present an algorithm for detecting human actions
based upon a single given video example of such actions.
The proposed method is unsupervised, does not require
learning, segmentation, or motion estimation. The novel
features employed in our method are based on space-time
locally adaptive regression kernels. Our method is based
on the dense computation of so-called space-time local
regression kernels (i.e. local descriptors) from a query
video, which measure the likeness of a voxel to its spatiotemporal
surroundings. Salient features are then extracted
from these descriptors using principal components analysis
(PCA). These are efficiently compared against analogous
features from the target video using a matrix generalization
of the cosine similarity measure. The algorithm yields a
scalar resemblance volume; each voxel indicating the likelihood
of similarity between the query video and all cubes in
the target video. By employing non-parametric significance
tests and n...