We propose a robust scene recognition system for baseball broadcast videos. This system is based on the data-driven approach which has been successful in continuous speech recognition. It uses a multi-stream hidden Markov model to model each scene and an unsupervised adaptation method to achieve robustness against differences in environmental conditions among games. It also employs an n-gram language model to represent the contexts among scenes, and a model for scene length information. The proposed system was evaluated in scene recognition experiments with 16 scene types acquired from video data of 25 baseball games. The system reduced errors in scene recognition by 6.3 % absolute. Categories and Subject Descriptors I.2.10 [Vision and Scene Understanding]: Video analysis; I.4.8 [Scene Analysis]: Time-varying imagery; H.2.4 [System]: Multimedia databases General Terms Algorithms, Experimentation Keywords CBVIR, sports video, indexing, HMM, n-gram model, scene context, adaptation