In tracking face and facial actions of unknown people, it is essential to take into account two components of facial shape variations: shape variation between people and variation caused by different facial actions such as facial expressions. This paper presents a monocular method of tracking faces and facial actions using a multilinear face model that treats interpersonal and intrapersonal shape variations separately. We created this method using a multilinear face model by integrating two different frameworks: particle filter-based tracking for time-dependent facial action and pose estimation and incremental bundle adjustment for person-dependent shape estimation. This unique combination together with multilinear face models is the key to tracking faces and facial actions of arbitrary people in real time with no pre-learned individual face models. Experiments using real video sequences demonstrate the effectiveness of our method.