Abstract. This article presents a novel framework to register and fuse heterogeneous sensory data. Our approach is based on geometrically registration of sensory data onto a set of virtual parallel planes and then applying an occupancy grid for each layer. This framework is useful in surveillance applications in presence of multi-modal sensors and can be used specially in tracking and human behavior understanding areas. The multi-modal sensors set in this work comprises of some cameras, inertial measurement sensors (IMU), laser range finders (LRF) and a binaural sensing system. For registering data from each one of these sensors an individual approach is proposed. After registering multimodal sensory data on various geometrically parallel planes, a two-dimensional occupancy grid (as a layer) is applied for each plane.