In this paper we propose a system that annotates a user generated video based on the associated location metadata, by exploiting user-tagged image databases. An example of such a database is a photo sharing website such as Flickr [1] where users upload their images and annotate them with various tags. The goal is to find the tags that have high probability of being relevant to the video without any complex object or action recognition being done to the video sequence. A video is first segmented into camera views and a set of keyframes are selected to represent the video. We will describe the concept of camera view as the basic element of user generated videos which has special properties suitable for the video annotation application. The keyframes are used to retrieve the most relevant images in the database. A "tag processing" step is then used to tag the video.
Golnaz Abdollahian, Edward J. Delp