We describe a method to enable the selection of specific text regions with a hand-held camera by means of projecting a structured light pointer on the document. The user indicates the text required by dragging the laser pointer over it while a sequence of images is captured. By tracking the motion of the camera over the document and matching the trajectory to the actual text, the system is able to precisely determine the text portion the user intended to capture. By using this selection method along with a text-processing pipeline and OCR, a general purpose hand-held device (such as a PDA or mobile phone) with a camera could be used as effectively as single-purpose pen scanning devices. We present our results showing successful capture and extraction of text.