Robust extraction of text from scene images is essential for successful scene text recognition. Scene images usually have non-uniform illumination, complex background, and existence of text-like objects. The common assumption of a homogeneous text region on a nearly uniform background cannot be maintained in real applications. We proposed a text extraction method that utilizes user’s hint on the location of the text within the image. A resizable square rim in the viewfinder of the mobile camera, referred to here as a ’focus’, is the interface used to help the user indicate the target text. With the hint from the focus, the color of the target text is easily estimated by clustering colors only within the focused section. Image binarization with the estimated color is performed to extract connected components. After obtaining the text region within the focused section, the text region is expanded iteratively by searching neighboring regions with the updated text color. Such an it...