This paper explores an approach for extracting scene text from a sequence of images with relative motion between the camera and the scene. It is assumed that the scene text lies on planar surfaces, whereas the other features are likely to be at random depths or undergoing independent motion. The motion model parameters of these planar surfaces are estimated using gradient based methods, and multiple motion segmentation. The equations of the planar surfaces, as well as the camera motion parameters are extracted by combining the motion models of multiple planar surfaces. This approach is expected to improve the reliability and robustness of the estimates, which are used to perform perspective correction on the individual surfaces. Perspective correction can lead to improvement in OCR performance. This work could be useful for detecting road signs and billboards from a moving vehicle.