Recently, the generative modeling approach to video segmentation has been gaining popularity in the computer vision community. For example, the flexible sprites framework has been studied in, among other references, [11,13,14,24]. In general, detailed generative models are vulnerable to intractability of inference and local minima problems when approximations are made (see, e.g., [25]). Recent approaches to dealing with these problems focused on inference techniques for increasingly more expressive models. Simpler models, on the other hand, while less precise, are often not just faster, but less prone to local minima. In addition, while many different models may be based on similar hidden variables, some models may be more amenable to inference of some of the shared variables, while other models lead to efficient and accurate inference of other components of the hierarchical data description. In this paper, we empirically illustrate that forcing multiple models to share the posterior ...
Nebojsa Jojic, John M. Winn, Larry Zitnick