This paper presents a visual particle filter for tracking a variable number of humans interacting in indoor environments, using multiple cameras. It is built upon a 3-dimensional, descriptive appearance model which features (i) a 3D shape model assembled from simple body part elements and (ii) a fast while still reliable rendering procedure developed on a key view basis of previously acquired body part color histograms. A likelihood function is derived which, embedded in an occlusion-robust multibody tracker, allows for robust and ID persistent 3D tracking in cluttered environments. We describe both model rendering and target detection procedures in detail, and report a quantitative evaluation of the approach on the ’CLEAR’07 3D Person Tracking’ corpus.