Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques