The study proposes a novel scheme to extract and recognize the caption contents of various sports captions. A caption extraction process based on an iteratively temporal averaging technique is used to detect and locate a caption region in a series of video frames. Moreover, a caption-content extraction process based on caption identification and model-based segmentation processes is used to extract accurately the contents of various sports captions. Finally, some low-quality character images extracted from the caption contents are recognized using a commercial OCR. Experimental results show that the proposed model-based segmentation approach is very efficient to extract the contents of the various sports captions. Furthermore, the recognition performance from the application of the segmentation approach can be improved about 7.72% in test numeral set, compared to the projection-based segmentation method.