This paper proposes a method to automatically extract highlight scenes from sports (baseball) live video in real time and to allow users to retrieve them. For this purpose, sophisticated speech recognition is employed to convert the speech signal into the text and to extract a group of keywords in real time. Image processing detects, also in real time, the pitcher scenes and extracts pitching sections starting from a pitcher scene and ending at the successive pitcher scene. Highlight scenes are extracted as the pitching sections with the keywords such as home run, two-base hit and three-base hit extracted from speech signals. Categories and Subject Descriptors H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems—Audio input/output, Video; I.5.4 [Pattern Recognition]: Applications—Signal processing. General Terms Experimentation. Keywords highlight scenes, sports live video, speech recognition, acoustic model adaptation, language model adaptation.