The automatic extraction of sports video highlights is a typical kind of personalized media production process. Many ways have been studied from the viewpoints of lowlevel audio/visual processing (e.g. detection of excited commentator speech), event detection (e.g. goal detection), etc. However, the subjectivity of highlights is an unavoidable bottleneck. The replay scene is an effective clue for highlights in broad- cast sports video due to the incorporation of video production knowledge. Most related work deals with the replay detection and/or a simple composition of all detected replays to generate highlights. Different from previous work, our work considers different flavors of different people in terms of highlight content or type through replay scenes classification. The main contributions include: 1) proposing a multi-modal (visual+textual) approach for refined replay classification; 2) employing the sources of Broadcast Web Text (BWT) to facilitate replay content analysis. An ...