In this paper, we propose a Bayesian approach to video object segmentation. Our method consists of two stages. In the first stage, we partition the video data into a set of 3D watershed volumes, where each watershed volume is a series of corresponding 2D image regions. These 2D image regions are obtained by applying to each image frame the marker-controlled watershed segmentation, where the markers are extracted by first generating a set of initial markers via temporal tracking and then refining the markers with two shrinking schemes: the iterative adaptive erosion and the verification against a pre-simplified watershed segmentation. Next, in the second stage, we use a Markov random field to model the spatio-temporal relationship among the 3D watershed volumes that are obtained from the first stage. Then, the desired video objects can be extracted by merging watershed volumes having similar motion characteristics within a Bayesian framework. A major advantage of this method is that it ...