Meetings are an integral part of business life for any organization. In previous work, we have developed a physical awareness system called CAMEO (Camera Assisted Meeting Event Observer) to record and process the audio/visual information of a meeting. An important task in meeting understanding is to know who and how many people are attending the meeting. In this paper, we present an automatic approach to detect, track, and cluster people's faces in long video sequences. This is a challenging problem due to the appearance variability of people's faces (illumination, expression, pose, ...). Two main novelties are presented: ? A robust real-time adaptive subspace face tracker which combines color and appearance. ? A temporal subspace clustering algorithm. The effectiveness and robustness of the proposed system is demonstrated over a data set of long videos (i.e. 1 hour).