In this paper, we present a novel framework to customize multimedia messages for mobile users. The goal is to generate a video message from a series of pictures. The framework includes visual attention view detection, image grouping, image ranking, and slideshow generation. Considering the limitation of mobile device, we use a simple color feature based attention model to detect interesting regions of the images. We group the images, and rank them based on the attention view similarities. Finally a human perception based slideshow is designed to keep the mobile users’ eye on attention regions efficiently. In addition, a short music is selected to match the video message. Extensive experiments and user studies show the promising performance of the proposed system.