In this paper, we present a new technique based on feature localization for segmenting and tracking objects in videos. A video locale is a sequence of image feature locales that share similar features (color, texture, shape, and motion) in the spatiotemporal domain of videos. Image feature locales are grown from tiles (blocks of pixels) and can be non-disjoint and nonconnected. To exploit the temporal redundancy in digital videos, two algorithms (intra-frame and inter-frame) are used to grow locales efficiently. Multiple motion tracking is achieved by tracking and performing tile-based dominant motion estimation for each locale separately.
James Au, Ze-Nian Li, Mark S. Drew