We are interested in the content analysis of video from a collection of spatially distant cameras viewing a single environment. We address the task of counting the number of different people walking through such an environment, which requires that the system can identify which observations from different cameras show the same person. Our system achieves this by combining visuul appearance matching with mutual content constraints between the cameras. We present results from a system with four very different camera views that counts people walking through and around a research lab.