Scene understanding from a monocular, moving camera is a challenging problem with a number of applications including robotics and automotive safety. While recent systems have shown that this is best accomplished with a 3D scene model, handling of partial object occlusion is still unsatisfactory. In this paper we propose an approach that tightly integrates monocular 3D scene trackingby-detection with explicit object-object occlusion reasoning. Full object and object part detectors are combined in a mixture of experts based on their expected visibility, which is obtained from the 3D scene model. For the difficult case of multi-people tracking, we demonstrate that our approach yields more robust detection and tracking of partially visible pedestrians, even when they are occluded over long periods of time. Our approach is evaluated on two challenging sequences recorded from a moving camera in busy pedestrian zones and outperforms several state-of-the-art approaches.