This paper introduced a human surveillance system which integrated the face understanding technologies to recognize personal identities in real time. We proposed a coarse-to-fine strategy to find out faces quickly and accurately. The global features of faces are used to reduce the search areas while the local ones are utilized to refine the faces’ positions. Tracking and maintaining these faces all the time also help the detection on next frame. Finally, facial features are estimated and the results of before frames are combined to make the system more stable and robust to unexpected human activities. More than 60 persons of an on-line experiment were trained in the natural environment. Each of them was effectively recognized in terms of the system correctness and performance.