This paper presents a real time video surveillance system which is capable of tracking multiple humans simultaneously. To better deal with various challenging issues such as occlusions, sharp motion changes and multi-person confusions, we propose an intelligent fusion framework where multiple cues are combined to seek the optimal objects state and more reliable cues have larger influences on the final decision. Further, part based human tracking provides a second-level information fusion in that parts with weak observability can be compensated by tracking other more visible ones, which demonstrates its effectiveness for highly articulated objects like humans.