We present a unified method for simultaneously acquiring both the location and the silhouette shape of people in outdoor scenes. The proposed algorithm integrates top-down and bottom-up processes in a balanced manner, employing both appearance and motion cues at different perceptual levels. Without requiring manually segmented training data, the algorithm employs a simple top-down procedure to capture the high-level cue of object familiarity. Motivated by regularities in the shape and motion characteristics of humans, interactions among low-level contour features are exploited to extract mid-level perceptual cues such as smooth continuation, common fate, and closure. A Markov random field formulation is presented that effectively combines the various cues from the top-down and bottom-up processes. The algorithm is extensively evaluated on static and moving pedestrian datasets for both detection and segmentation.
Vinay Sharma, James W. Davis