In this paper, we consider the problem of categorizing
videos of dynamic textures under varying view-point. We
propose to model each video with a collection of Linear
Dynamics Systems (LDSs) describing the dynamics of spatiotemporal
video patches. This bag of systems (BoS) representation
is analogous to the bag of features (BoF) representation,
except that we use LDSs as feature descriptors.
This poses several technical challenges to the BoF framework.
Most notably, LDSs do not live in a Euclidean space,
hence novel methods for clustering LDSs and computing
codewords of LDSs need to be developed. Our framework
makes use of nonlinear dimensionality reduction and clustering
techniques combined with the Martin distance for LDSs
for tackling these issues. Our experiments show that our
BoS approach can be used for recognizing dynamic textures
in challenging scenarios, which could not be handled by
existing dynamic texture recognition methods.