In this paper, we propose an efficient way to simultaneously label and map targets over a multi-camera surveillance system. In the system, we first fuse the detection results from multiple cameras into a posterior distribution. This distribution indicates the likelihood of having some moving targets on the ground plane. Based on the distribution, isolated targets, together with their 3-D positions, are identified in a sample-based manner, which combines Markov Chain Monte Carlo (MCMC), and Mean-Shift clustering. The induced 3-D scene information is further inputted into a 3-layer Bayesian hierarchical framework (BHF), which adopts a Markov network to deal with the object labeling and correspondence problems. In principle, labeling and correspondence are regarded as a unified optimal problem subject to 3-D scene prior, image color similarity, and detection results. The experiments show that accurate results can be gotten even under situations with severe occlusion.