We present a video summarization technique based on supervised learning. Within a class of videos of similar nature, user provides the desired summaries for a subset of videos. Based on this supervised information, the summaries for other videos in the same class are generated. We derive frame-transitional features and subsequently represent each frame transition as a state. We then formulate a loss functional to quantify the discrepency between state tansitional probabilities in the original video and that in the intended summary video, and optimize this functional. We experimentally validate the performance of the technique using cross-validation scores on two different class of videos, and demonstrate that the proposed technique is able to produce high quality summarization capturing the user perception.