In this paper we explore the problem of accurately segmenting a person from a video given only approximate location of that person. Unlike previous work which assumes that the appearance model is known in advance, we developed an iterative expectation-sampling (ES) algorithm for solving segmentation and appearance modeling simultaneously. The appearance model is encoded with a kernel-based PDF defined in a joint color/path-length space. This appearance model remains unchanged during a short time period, although the object can articulate. Thus, we can perform the ES iteration not only for a single frame but also for an image sequence. The algorithm is iterative, but simple, efficient and gives visually good results.
Liang Zhao, Larry S. Davis