We present a framework for annotating dynamic scenes involving occlusion and other uncertainties. Our system comprises an object tracker, an object classifier and an algorithm for reasoning about spatio-temporal continuity. The principle behind the object tracking and classifier modules is to reduce error by increasing ambiguity (by merging objects in close proximity and presenting multiple hypotheses). The reasoning engine resolves error, ambiguity and occlusion to produce a most likely hypothesis, which is consistent with global spatio-temporal continuity constraints. The system results in improved annotation over frame-by-frame methods. It has been implemented and applied to the analysis of a team sports video.
Brandon Bennett, Derek R. Magee, Anthony G. Cohn,