Human vision system actively seeks interesting regions in images to reduce the search effort in tasks, such as object detection and recognition. Similarly, prominent actions in video sequences are more likely to attract human’s first sight than their surrounding neighbors. In this paper, we propose a spatiotemporal video attention detection technique for detecting the attended regions that correspond to both interesting objects and actions in video sequences. Both spatial and temporal saliency maps are constructed and further fused in a dynamic fashion to produce the overall spatiotemporal attention model. In the temporal attention model, motion contrast is computed based on the planar motions (homography) between images, which is estimated by applying RANSAC on point correspondences in the scene. To compensate the non-uniformity of spatial distribution of interest-points, spanning areas of motion segments are incorporated in the motion contrast computation. In the spatial attenti...