Our goal is to segment multiple interacting and deforming agents in a video. Detectors often fail under large body deformation or agent entanglement. On the other hand, segmenting using motion similarity fails to separate similarly moving agents or to group dinstinctly moving articulated body parts. We employ novel cues and representations for spatio-temporal grouping. Figure-ground segregation, scene topology and motion similarity are used for clustering dense feature trajectories. We have an explicit figureground representation and impose foreground connectedness constraints through repulsive forces between disconnected trajectories of the foreground. Our spatio-temporal segments have a semantic interpretation as connected moving entities rather than just groups of similar motion. We exploit cue prioritization: affinities between long trajectories are more reliable than those of short ones and thus more informative for grouping. Our framework is general and unifies tracking and v...