In this paper, two multimodal systems for the tracking of multiple users in smart environments are presented. The first is a multiview particle filter tracker using foreground, color and special upper body detection and person region features. The other is a wide angle overhead view person tracker relying on foreground segmentation and model-based blob tracking. Both systems are completed by a joint probabilistic data association filter-based source localizer using the input from several microphone arrays. While the first system fuses audio and visual cues at the feature level, the second one incorporates them at the decision level using state-based heuristics. The systems are designed to estimate the 3D scene locations of room occupants and are evaluated based on their precision in estimating person locations, their accuracy in recognizing person configurations and their ability to consistently keep track identities over time. The trackers are extensively tested and compared, for...