Accurate grouping of video shots could lead to semantic indexing of video segments for content analysis and retrieval. This paper introduces a novel cluster analysis which, depending both on the video genre and the specific user needs, produces a hierarchical representation of the video only on a reduced number of significant summaries. An outlook on a possible implementation strategy is then suggested. Specifically, vector-quantization codebooks are used to represent the visual content and to cluster the shots with a similar chromatic consistency. The evaluation of the codebook distortion introduced in each cluster is used to stop the procedure on few levels, exploiting the dependency relationships between clusters. Finally, the user can navigate through summaries at each hierarchical level and then decide which level to adopt for eventual post-processing. The effectiveness of the proposed method is validated through a series of experiments on real visual-data excerpted from diffe...