This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. A complete system consisting of an adaptive tracker, an event generator, and the parser performs segmentation and labeling of a surveillance video of a parking lot; the system correctly identifies activities such as pick-up and drop-off, which involve person-vehicle interactions. The main contributions of this paper are extending the parsing algorithm to handle multi-agent interactions within a single parser, providing a general mechanism for consistency-based pruning, and developing an efficient incremental parsing algorithm.
Yuri A. Ivanov, Aaron F. Bobick