We propose a robust scene recognition framework using scene context information for multimedia contents. Multimedia contents consist of scene sequences that are more likely to happen compared with other scene sequences. We employ a statistical approach to deal with this scene context information. We employ a hidden Markov model (HMM) to model each scene and n-gram language model to represent the contexts among scenes. We evaluated the proposed method in scene recognition experiments for 16 scenes in video data of 25 baseball games. The proposed method significantly improved the results compared to that without scene context information. Categories and Subject Descriptors I.2.10 [Vision and Scene Understanding]: Video analysis; I.4.8 [Scene Analysis]: Time-varying imagery General Terms Algorithms, Experimentation Keywords CBVIR, sports video, indexing, HMM, n-gram model