Video caption detection,and evtraction is an important step for information retrieval in video databases. In this paper, we extract test information in video by fully utilizing the temporal information contained in the irst we'create a binary abstract sequence from a video segment. By analyzing the statistical pixel changes in the sequence, we can effectively locate the (dis)appearing frames of captions. Finally we extract the captions to create a summay of the video segment.