In this paper, we propose an effective coarse-to-fine algorithm to detect text in video. Firstly, in coarse-detection section, stroke filter is employed to detect all candidate stroke pixels, and then a fast region growing method is developed to connect these pixels into regions which are further separated into candidate text lines by projection operation. Secondly, in fine-detection section, correct text regions are selected from candidate ones by support vector machine (SVM) model and stroke features, and text regions in multi-resolution are integrated. Finally, the result is optimized significantly according to temporal correlation information. Experimental results show that our algorithm achieves real-time performance and is robust for the variation of language, font, size, color and noise of text caused by low frame resolution in video.