This paper addresses the face hallucination problem of converting thermal infrared face images into photo-realistic ones. It is a challenging task because the two modalities are of dramatical difference, which makes many developed linear models inapplicable. We propose a learning-based framework synthesizing the normal face from the infrared input. Compared to the previous work, we further exploit the local linearity in not only the image spatial domain but also the image manifolds. We have also developed a measurement of the variance between an input and its prediction, thus we can apply the Markov random field model to the predicted normal face to improve the hallucination result. Experimental results show the advantage of our algorithm over the existing methods. Our algorithm can be readily generalized to solve other multi-modal image conversion problems as well.