Video segmentation requires the partitioning of a series of images into groups that are both spatially coherent and smooth along the time axis. We formulate segmentation as a Bayesian clustering problem. Context information is propagated over time by a conjugate structure. The level of segment resolution is controlled by a Dirichlet process prior. Our contributions include a conjugate nonparametric Bayesian model for clustering in multivariate time series, a MCMC inference algorithm, and a multiscale sampling approach for Dirichlet process mixture models. The multiscale algorithm is applicable to data with a spatial structure. The method is tested on synthetic data and on videos from the MPEG4 benchmark set.
Peter Orbanz, Samuel Braendle, Joachim M. Buhmann