This paper proposes a novel method for robust and automatic realtime head tracking by fusing face and head cues within a multi-state particle filter. Due to large appearance variability of human head, most existing head tracking methods use little object-specific prior knowledge, resulting in limited discriminant power. In contrast, face is a distinct pattern much easier to capture, which motivates us to incorporate a vector-boosted multi-view face detector[6] to lend strong aid to general head observation cues including color and contour edge. To simultaneously and collaboratively perform temporal inference of both the face state and the head state, a Markov-network-based particle filter is constructed using sequential belief propagation Monte Carlo[5]. Our approach is tested on sequences used by previous researchers as well as on new data sets which includes many challenging real-world cases, and shows robustness against various unfavorable conditions.