This paper presents an approach that incorporates Canonical Correlation Analysis (CCA) for monocular 3D face pose and facial animation estimation. The CCA is used to find the dependency between texture residuals and 3D face pose and facial gesture. The texture residuals are obtained from observed raw brightness shape-free 2D image patches that we build by means of a parameterized 3D geometric face model. This method is used to correctly estimate the pose of the face and the model's animation parameters controlling the lip, eyebrow and eye movements (encoded in 15 parameters). Extensive experiments on tracking faces in long real video sequences show the effectiveness of the proposed method and the value of using CCA in the tracking context.