Vision systems for service robotics applications have to cope with varying environmental conditions, partial occlusions, complex backgrounds and a large number of distractors (clutter) present in the scene. This paper presents a new approach targeted at such application scenarios that combines segmentation, object recognition, 3D localization and tracking in a seamlessly integrated fashion. The unifying framework is the probabilistic representation of various aspects of the scene. Experiments indicate that this approach is viable and gives very satisfactory results.