This paper addresses the problem of matching visual parts of video sequences from within a large collection. The visual content of a video sequence is described by the set of the most representative local features extracted in its frames. An index is proposed that permits to retrieve efficiently the pre-computed distances between every sequences of the collection and every local features. Relying on this structure, the partial matching is performed through an interactive feature selection algorithm that iteratively integrates the user knowledge to estimate a model of the queried pattern. We show that the method performs well both in terms of retrieval accuracy and response time efficiency, on large video collections.