In this paper, we present an index structure-based method to fast and robustly search short video clips in large video collections. First we temporally segment a given long video stream into overlapped matching windows, then map extracted features from the windows into points in a high dimensional feature space, and construct index structures for these feature points for querying process. Different from linear-scan similarity matching methods, querying process can be accelerated by spatial pruning brought by an index structure. A multi-resolution kd-tree (mrkd-tree) is employed to complete exact K-NN Query and range query with the aim of fast and precisely searching out all short video segments having the same contents as the query. In terms of feature representation, rather than selecting representative key frames, we develop a set of spatial-temporal features in order to globally capture the pattern of a short video clip (e.g. a commercial clip, a lead in/out clip) and combine it wi...