This paper proposes a novel hybrid method to robustly and accurately localize texts in natural scene images. A text region detector is designed to generate a text confidence map, based on which text components can be segmented by local binarization approach. A Conditional Random Field (CRF) model, considering the unary component property as well as binary neighboring component relationship, is then presented to label components as ”text” or ”non-text”. Last, text components are grouped into text lines with an energy minimization approach. Experimental results show that the proposed method gives promising performance comparing with the existing methods on ICDAR 2003 competition dataset.