Manual labeling of objects in videos is a tedious task. We present an approach which automatically propagates the labels from a single frame to the next ones. We tackle the challenging problem of tracking segmented regions by combining keypoint tracking with an advanced multiple region matching strategy, based on inclusion similarity and connected regions. We ran experiments on a 101 frame driving video sequence for which we produced the corresponding handlabeled groundtruth. We make this valuable dataset available for the research community. We show our technique can accommodate variations in segmentation (and correct them), even in presence of multiple independent motions and partial occlusion. Results show that most of the labeled pixels can be correctly propagated even after a hundred frames. The performance of this automatic propagation mechanism over many frames can greatly reduce the user effort in the task of video object labeling.
Julien Fauqueur, Gabriel J. Brostow, Roberto Cipol