We present a new adaptive algorithm for automatic detection of text from a natural scene. The initial cues of text regions are first detected from the captured image/video. An adaptive color modeling and searching algorithm is then utilized near the initial text cues, to discriminate text/non-text regions. EM optimization algorithm is used for color modeling, under the constraint of text layout relations for a specific language. The proposed algorithm combines the advantages of several previous approaches for text detection, and utilizes a focus-of-attention approach for text finding. The whole algorithm is applied in a prototype system that can automatically detect and recognize sign input from a video camera, and translate the signs into English text or voice streams. We present evaluation results of our algorithm on this system.