An algorithm for speaker's lip segmentation and features extraction is presented in this paper. A color video sequence of speaker's face is acquired, under natural lighting conditions and without any particular make-up. First, a logarithmic color transform is performed from RGB to HI (hue, intensity) color space. Second, a statistical approach using markov random eld modelling determines lip prevailing region and motion in a spatiotemporal neighbourhood. Third, the nal label eld is used to extract ROI (Region Of Interest) and geometrical features.