In a lip-reading system, one key issue is how to extract the visual features, which greatly impact on the lip-reading recognition accuracy and efficiency. In this paper, we propose a novel motion based visual feature representation. Compared with the existing methods, our approach focuses on the crucial part of lip movement, but not all pixels around lip contours for different utterance, and captures the motion tracks of each part. Accordingly, distinctive feature vectors are built to represent the whole lip motion process for the specified utterance, rather than the separate frame images. Experimental result shows the efficacy of the proposed approach.