This paper presents a unified system for segmentation and tracking of face and hands in a sign language recognition using a single camera. Unlike much related work that uses colour gloves, we detect skin by combining 3 useful features: colour, motion and position. These features together, represent the skin colour pixels that are more likely to be foreground pixels and are within a predicted position range. We extend the previous research in occlusion detection to handle occlusion between any of the skin objects using a Kalman filter based algorithm. The tracking improves the segmentation by reducing the search space and the segmentation enhances the overall tracking process. The algorithm is tested on several video sequences from a standard database and can provide a very low error rate.