This paper presents an unsupervised learning approach to video-based face recognition that does not make any assumptions about the pose, expressions or prior localization of landmarks on the faces. The proposed algorithm exploits spatiotemporal information obtained from local features that are extracted from arbitrary keypoints on faces as opposed to pre-defined landmarks. The algorithm is inherently robust to large scale occlusions as it relies on local features. During unsupervised learning, faces from a video sequence are automatically clustered based on the similarity of their local features and a voting-based algorithm is employed to pick the representative features of each cluster. During recognition, video frames of a probe are sequentially matched to the clusters of all individuals in the gallery and its identity is decided on the basis of best temporally cohesive cluster matches. The proposed algorithms can also detect sudden identity changes in video by utilizing the tempor...