Natural scene images brought new challenges for a few years and one of them is text understanding over images or videos. Text extraction which consists to segment textual foreground from the background succeeds using color information. Faced to the large diversity of text information in daily life and artistic ways of display, we are convinced that this only information is no more enough and we present a color segmentation algorithm using spatial information. Moreover, a new method is proposed in this paper to handle uneven lighting, blur and complex backgrounds which are inherent degradations to natural scene images. To merge text pixels together, complementary clustering distances are used to support simultaneously clear and well-contrasted images with complex and degraded images. Tests on a public database show finally efficiency of the whole proposed method.