This paper presents a gesture based driven approach for video editing. Given a lecture video, we adopt novel approaches to automatically detect and synchronize its content with electronic slides. The gestures in each synchronized topic (or shot) are then tracked and recognized continuously. By registering shots and slides and recovering their transformation, the regions where the gestures take place can be known. Based on the recognized gestures and their registered positions, the information in slides can be seamlessly extracted, not only to assist video editing, but also to enhance the quality of original lecture video.