We present a novel method for analyzing social behavior. Continuous videos are segmented into action ‘bouts’ by building a temporal context model that combines features from spatio-temporal energy and agent trajectories. The method is tested on an unprecedented dataset of videos of interacting pairs of mice, which was collected as part of a state-of-the-art neurophysiological study of behavior. The dataset comprises over 88 hours (8 million frames) of annotated videos. We find that our novel trajectory features, used in a discriminative framework, are more informative than widely used spatio-temporal features; furthermore, temporal context plays an important role for action recognition in continuous videos. Our approach may be seen as a baseline method on this dataset, reaching a mean recognition
Xavier P. Burgos-Artizzu, Piotr Dollár, Day