Retrieving videos using key words requires obtaining the semantic features of the videos. Most work reported in the literature focuses on annotating a video shot with a fixed number of key words, no matter how much information is contained in the video shot. In this paper, we propose a new approach to automatically annotate a video shot with an adaptive number of annotation key words according to the richness of the video content. A Semantic Candidate Set (SCS) with fixed size is discovered using visual features. Then the final annotation set, which has an unfixed number of key words, is obtained from the SCS by using Bayesian Inference, which combines static and dynamic inference to remove the irrelevant candidate key words. We have applied our approach to video retrieval. The experiments demonstrate that video retrieval using our annotation approach outperforms retrieval using a fixed number of annotation words.