Automatic semantic annotation of sports video requires that the domain knowledge is properly included and exploited in the annotation process and that low and intermediate-level features are conveniently selected, extracted from the video and combined so that their spatio-temporal combinations identify the prominent highlights. Spatial and temporal extensions of the highlights must be precisely detected in order to permit the extraction of the most salient parts of the video and construct automatically meaningful trailers and summaries. In this paper we propose finite state machine models and model checking as a general and effective approach to model sports highlights and detect their occurrence automatically, from the analysis of the video visual cues. The approach has been tested on soccer and swimming that are representatives of two different typologies of sports, respectively team sport and race sport. Keywords automatic video annotation, model checking, sports video