Tracking people within a scene has been a longstanding challenge in the field of computer vision. A common approach involves matching the background against the incoming video stream, with the assumption that any unmatched pixels belong to the people being tracked. Such methods, however, seem intrinsically flawed, as they do not incorporate any specific characteristics of the target in question, such as motion or shape and their performance tends be both limited and contingent upon a semi-static background. To overcome these deficiencies, we propose a saliency-based approach, which requires minimal a priori information concerning the target. Motion characteristics dictate a saliency map and highly salient regions contribute to the automated acquisition of target-specific features. In addition to improved robustness, the algorithm offers the advantages of independence from a background model and requires no explicit interaction with the user, nor imposes any restrictions on the target....
Shawn Arseneau, Jeremy R. Cooperstock