We propose a novel formulation of stereo matching that considers each pixel as a feature vector. Under this view, matching two or more images can be cast as matching point clouds in feature space. We build a nonparametric depth smoothness model in this space that correlates the image features and depth values. This model induces a sparse graph that links pixels with similar features, thereby converting each point cloud into a connected network. This network defines a neighborhood system that captures pixel grouping hierarchies without resorting to image segmentation. We formulate global stereo matching over this neighborhood system and use graph cuts to match pixels between two or more such networks. We show that our stereo formulation is able to recover surfaces with different orders of smoothness, such as those with high-curvature details and sharp discontinuities. Furthermore, compared to other single-frame stereo methods, our method produces more temporally stable results from vide...
Brandon M. Smith, Hailin Jin, Li Zhang