This paper describes a system that can build appearance models of animals automatically from a video sequence of the relevant animal with no explicit supervisory information. The video sequence need not have any form of special background. Animals are modeled as a 2D kinematic chain of rectangular segments, where the number of segments and the topology of the chain are unknown. The system detects possible segments, clusters segments whose appearance is coherent over time, and then builds a spatial model of such segment clusters. The resulting representation of the spatial configuration of the animal in each frame can be seen either as a track -- in which case the system described should be viewed as a generalized tracker, that is capable of modeling objects while tracking them -- or as the source of an appearance model which can be used to build detectors for the particular animal. This is because knowing a video sequence is temporally coherent -- i.e. that a particular animal is pres...
Deva Ramanan, David A. Forsyth