In this paper, we consider the problem of super-resolving a human face video by a very high (?16) zoom factor. Inspired by recent literature on hallucination and examplebased learning, we formulate this task using a graphical model that encodes 1) spatio-temporal consistencies, and 2) image formation & degradation processes. A video database of facial expressions is used to learn a domainspecific prior for high-resolution videos. The problem is posed as one of probabilistic inference, in which we aim to find the high resolution video that best satisfies the constraints expressed through the graphical model. Traditional approaches to this problem using video data first estimate the relative motion between frames and then compensate for it, effectively resulting in multiple measurements of the scene. Our use of time is rather direct: We define data structures that span multiple consecutive frames, enriching our feature vectors with a temporal signature. We then exploit these signatu...