Abstract— Video summarization has gained increased popularity in the emerging multimedia communication applications, however, very limited work has been conducted to address the transmission problem of video summary frames. In this paper, we propose a cross-layer optimization framework for delivering video summaries over wireless networks. Within a rate-distortion theoretical framework, the source coding, allowable retransmission, and adaptive modulation and coding have been jointly optimized, which reflects the joint selection of parameters at physical, data link and application layers. The goal is to achieve the best video quality and content coverage of the received summary frames and to meet the delay constraint. The problem is solved using Lagrangian relaxation and dynamic programming. Experimental results indicate the effectiveness and efficiency of the proposed optimization framework, especially when the delay budget imposed by the upper layer applications is small, where mo...