This paper describes a fast audio detection method for specific audio retrieval in the AV stream. The method is a histogram matching algorithm based on structural and perceptual features. This algorithm extracts audio features based on human perception on the sound scene and locates the special audio clip by fast histogram matching. Experimental results based on the advertisement detection in TV program showed that the algorithm can achieve a very high overall precision and recall rate both about 97% with very fast search time about 1/40 on real time.