A robust video object segmentation algorithm for complex conditions in surveillance systems is proposed in this paper. This algorithm contains an unsupervised K-Means background clustering technique to model the temporal distribution in RGB domain for each spatial position. Based on the proposed background model, the object mask generation process integrates noise reduction, cast shadow cancellation, and improved watershed transform to obtain satisfying object masks. Experiments show that it can be applied on low-fame-rate and noisy video sequences in surveillance systems in which temporal tracking becomes impractical, and achieve better segmentation results than the previous works for complex lighting conditions and outdoor scenes.