To manage the massive growth of sport videos, we need to summarize the contents into a more compact and interesting representation. Unlike previous work which summarized either highlights or play scenes, we propose a unified summarization scheme which integrates both highlights and play-break scenes. For automation of the process, combination of audio and visual features provides more accurate detection. We will present fast detection algorithms of whistle and excitement to take advantage of the fact that audio features are computationally cheaper than visual features. However, due to the amount of noises in sport audio, fast text-display detection will be used for verification of the detected highlights. The performance of these algorithms has been tested against one hour of soccer and swimming videos. Categories and Subject Descriptors