A Bayesian marked point process (MPP) model is developed
to detect and count people in crowded scenes. The
model couples a spatial stochastic process governing number
and placement of individuals with a conditional mark
process for selecting body shape. We automatically learn
the mark (shape) process from training video by estimating
a mixture of Bernoulli shape prototypes along with an
extrinsic shape distribution describing the orientation and
scaling of these shapes for any given image location. The
reversible jump Markov Chain Monte Carlo framework is
used to efficiently search for the maximum a posteriori configuration
of shapes, leading to an estimate of the count,
location and pose of each person in the scene. Quantitative
results of crowd counting are presented for two publicly
available datasets with known ground truth.
Robert T. Collins, Weina Ge