In this paper we introduce a novel contextual fusion method to improve the detection scores of semantic concepts in images and videos. Our method consists of three phases. For each individual concept, the prior probability of the concept is incorporated with detection score of an individual SVM detector. Then probabilistic estimates of the target concept are computed using all of the individual SVM detectors. Finally, these estimates are linearly combined using weights learned from the training set. This procedure is applied to each target concept individually. We show significant improvements to our detection scores on the TRECVID 2005 development set and LSCOM-Lite annotation set. We achieved on average +3.9% improvements in 29 out of 39 concepts.