Extremely crowded scenes present unique challenges to
video analysis that cannot be addressed with conventional
approaches. We present a novel statistical framework for
modeling the local spatio-temporal motion pattern behav-
ior of extremely crowded scenes. Our key insight is to ex-
ploit the dense activity of the crowded scene by modeling
the rich motion patterns in local areas, effectively capturing
the underlying intrinsic structure they form in the video. In
other words, we model the motion variation of local space-
time volumes and their spatial-temporal statistical behav-
iors to characterize the overall behavior of the scene. We
demonstrate that by capturing the steady-state motion be-
havior with these spatio-temporal motion pattern models,
we can naturally detect unusual activity as statistical de-
viations. Our experiments show that local spatio-temporal
motion pattern modeling offers promising results in real-
world scenes with complex activities that are ha...