This article presents a modular, distributed and scalable many-camera system designed towards tracking multiple people simultaneously in a natural human-robot interaction scenario set in an apartment mock-up. The described system employs 40 high-resolution cameras networked to 15 computers, redundantly covering an area of approximately 100 square meters. The unique scale and set-up of the system require novel approaches for vision-based tracking, especially with respect to the transfer of targets between the different tracking processes while preserving the target identities. We propose an integrated approach to cope with these challenges, and focus on the system architecture, the target information management, the calibration of the cameras and the applied tracking methodologies themselves.