A layered method is presented in this paper to resolve the visibility problem in depth image-based rendering. A novel three-layer representation for each reference view, i.e. the main layer, background layer and the boundary layer, is proposed. A spatial-temporal method is designed to generate the boundary layer for pixel-based rendering (splatting). Meanwhile, a temporal background model is built for each frame by searching backward and forward for uncovered background information based on depth variance in the reference video. Promising results of view synthesis using the multiple-view 3d data from Microsoft Research, "break dancer" and "ballet", are given to demonstrate the performance of the proposed method.