With recent advances in motion detection and tracking in video, more efforts are being directed at higher-level video analysis such as recognizing actions, events and activities. One of the more challenging problems is recognizing activities that involve multiple people and/or vehicles, whose relationships change over time, when motion detection and tracking are unreliable, as commonly occurs in busy scenes. We describe an approach to this problem based on Dynamic Bayesian Networks, and show how DBNs can be extended to compensate for track failures. We also show that defining DBNs with semantic concepts improves robustness vs. direct observables, and discuss implications and ideas for incorporating semantic, symbolic knowledge into the perceptual domain of activity recognition.
Anthony Hoogs, A. G. Amitha Perera