In this paper, we present the association of audio and video signatures for short video clip detection. First, we present an audio signature based on the spectral flatness measure. Then we describe a spatio-temporal video signature, based on the evolution of gray level centroids over time. The major contribution of this work is the association of these two signatures in a so-called audiovisual signature by late integration of similarity measures obtained on both modalities. Our experiments conducted on a large video database (28Gb / 34h extracted from TRECVID2003) show that our audio-visual signature is more robust than the audio-only or video-only signatures, and also permits better detection of video clips of shorter duration (about 2 seconds).