We address the problem of weakly supervised semantic segmentation. The training images are labeled only by the classes they contain, not by their location in the image. On test images instead, the method must predict a class label for every pixel. Our goal is to enable segmentation algorithms to use multiple visual cues in this weakly supervised setting, analogous to what is achieved by fully supervised methods. However, it is difficult to assess the relative usefulness of different visual cues from weakly supervised training data. We define a parametric family of structured models, where each model weighs visual cues in a different way. We propose a Maximum Expected Agreement model selection principle that evaluates the quality of a model from the family without looking at superpixel labels. Searching for the best model is a hard optimization problem, which has no analytic gradient and multiple local optima. We cast it as a Bayesian optimization problem and propose an algorithm bas...
Alexander Vezhnevets, Vittorio Ferrari, Joachim M.