This work introduces a new pattern recognition model for segmenting and tracking lip contours in video sequences. We formulate the problem as a general nonrigid object tracking method, where the computation of the expected segmentation is based on a filtering distribution. This is a difficult task because one has to compute the expected value using the whole parameter space of segmentation. As a result, we compute the expected segmentation using sequential Monte Carlo sampling methods, where the filtering distribution is approximated with a proposal distribution to be used for sampling. The key contribution of this paper is the formulation of this proposal distribution using a new observation model based on deep belief networks and a new transition model. The efficacy of the model is demonstrated in publicly available databases of video sequences of people talking and singing. Our method produces results comparable to state-of-the-art models, but showing potential to be more robust to...