We present a top-down statistical modeling approach to explore the semantic structure in the American football video. First, a semantic space is defined where the video semantic structure is characterized by semantic units, a dynamic model over semantic units, and an observation model for mapping the semantic units with the visual features. Then, a new hidden Markov model (HMM)-based video generative model is proposed for American football video analysis, where semantic units are defined as latent or hident states corresponding to four different camera views in the football field. A set of relevant visual features are selected based on the information gain for HMM training and two kinds of state emission function, Gaussian or the Gaussian mixture model (GMM), which characterize the observation density function associated with each latent state, are tested in the proposed HMM for camera view-based video analysis. Experimental results on several real football videos manifest the effe...