Abstract. This paper presents a unified approach to crowd segmentation. A global solution is generated using an Expectation Maximization framework. Initially, a head and shoulder detector is used to nominate an exhaustive set of person locations and these form the person hypotheses. The image is then partitioned into a grid of small patches which are each assigned to one of the person hypotheses. A key idea of this paper is that while whole body monolithic person detectors can fail due to occlusion, a partial response to such a detector can be used to evaluate the likelihood of a single patch being assigned to a hypothesis. This captures local appearance information without having to learn specific appearance models. The likelihood of a pair of patches being assigned to a person hypothesis is evaluated based on low level image features such as uniform motion fields and color constancy. During the E-step, the single and pairwise likelihoods are used to compute a globally optimal set of ...