A method for segmenting and recognizing text embedded in video and images is proposed in this paper. In the method, multiple segmentation of the same text region is performed, thus producing multiple hypotheses of binary text images. The segmentation algorithm is stated as a statistical labeling and is based on a markov random field (MRF) model of the label map. Background regions in each hypothesis are then removed by performing a connected component analysis and by enforcing a more stringent constraint (called GCC) on the text characters grayscale values using a robust 1D-Median operator. Each text image hypothesis is then processed by an optical character recognition (OCR) software. The final result is then selected from the set of output strings. Results show that both the use of multiple hypotheses and the GCC significantly improve the results.