A sketch captures the most informative part of an object, in a much more concise and potentially robust representation (e.g., for face recognition or new capabilities of manipulating faces). We have previously developed a framework for generating face sketches from still images. A more interesting question is can we generate an animated sketch from video? We adopt the same hierarchical compositional graph model originally developed for still images for face representation, where each graph node corresponds to a multimodal model of a certain facial feature (e.g., close mouth, open mouth, and wide-open mouth). To enforce temporal-spatial consistency and improve tracking efficiency, we constrain the transition of a graph node to be only between immediate neighboring modes (e.g. from closed mouth to open mouth but not to wide-open mouth), as well as by its corresponding parts in the neighboring frames. To improve the matching accuracy, we model the local structure of a given mode as a sha...