Spatiotemporal segmentation is an essential task for video analysis. The strong interconnection between finding an object's spatial support and finding its motion characteristics makes the problem particularly challenging. Motivated by closure detection techniques in 2D images, this paper introduces the concept of spatiotemporal closure. Treating the spatiotemporal volume as a single entity, we extract contiguous "tubes" whose overall surface is supported by strong appearance and motion discontinuties. Formulating our closure cost over a graph of spatiotemporal superpixels, we show how it can be globally minimized using the parametric maxflow framework in an efficient manner. The resulting approach automatically recovers coherent spatiotemporal components, corresponding to objects, object parts, and object unions, providing a good set of multiscale spatiotemporal hypotheses for high-level video analysis.
Alex Levinshtein, Cristian Sminchisescu, Sven J. D