Pervasive multimedia devices require accurate video retargeting, especially in connected consumer electronics platforms. In this paper, we present a context assisted spatialtemporal grid scheme for consumer video retargeting. First, we parse consumer videos from low-level features to highlevel visual concepts, combining visual attention into a more accurate importance description. Then, a semantic importance map is built up representing the spatial importance and temporal continuity, which is incorporated with a 3D rectilinear grid scaleplate to map frames to the target display, thereby keeping the aspect ratio of semantically salient objects as well as the perceptual coherency. Extensive evaluations were done on two popular video genres, sports and advertisements. The comparison with state-of-the-art approaches on both images and videos have demonstrated the advantages of the proposed approach. Categories and Subject Descriptors I.2.10 [Vision and Scene Understanding]: Video analysis...