This paper proposes to combine spatial and color coherency with the pixel-wise GMM to determine the background model. We first represent each pixel with a hybrid feature vector, which includes its GMM likelihood, color and spatial features, and estimate the density for each video frame by a non-parametric method. Next, we apply a clustering process to segment the video frame into clusters with similar hybrid features. Finally, we replace the background likelihood for each cluster with the GMM likelihood in the cluster mode. Hence, the resulting background model becomes a smoothed GMM in terms of spatial and color coherency. For accurate object detection, we develop an adaptive thresholding scheme using Markove Random Field. Moreover, in order to reduce the computational load, we also propose a filtering step to skip pixels from the time-consuming clustering process. Our experimental results and comparisons demonstrate that the proposed background model indeed achieves better detection...