We present a generative model approach to explore intrinsic semantic structures in sport videos, e.g., the camera view in American football games. We will invoke the concept of semantic space to explicitly define the semantic structure in the video in terms of latent states. A dynamic model is used to govern the transition between states, and an observation model is developed to characterize visual features pertaining to different states. Then the problem is formulated as a statistical inference process where we want to infer latent states (i.e., camera views) from observations (i.e., visual features). Two generative models, the hidden Markov model (HMM) and the Segmental HMM (SHMM), are involved in this research. In the HMM, both latent states and visual features are shot-based, and in the SHMM, latent states and visual features are defined for shots and frames respectively. Both models provide promising performance for view-based shot classification, and the SHMM outperforms the HMM...