The major scientific problem for content-based video retrieval is the semantic gap. Generally speaking, there are two appropriate ways to bridge the semantic gap: the first one is from human perspective (top-down) and the other one is from computer perspective (bottom-up). The top-down method defines a concept lexicon from human perspective, trains the detector for each concept based on supervised learning, and then indexes the corpus with concept detectors. Since each concept has an explicit semantic meaning, we name this kind concept as an explicit concept. The bottom-up approach directly discovers the underlying latent topics from video corpus by machine perspective using an unsupervised learning. The video corpus then is indexed by these latent topics. As opposite to explicit concepts, we name latent topics as implicit concepts. Given the explicit concept set is pre-defined and independent of the corpus, it is impossible to completely describe corpus and users' queries. On th...