Text detection in video images has received increasing attention, particularly in scene text detection in video images, as it plays a vital role in video indexing and information retrieval. This paper proposes a new and robust gradient difference technique for detecting both graphics and scene text in video images. The technique introduces the concept of zero crossing to determine the bounding boxes for the detected text lines in video images, rather than using the conventional projection profiles based method which fails to fix bounding boxes when there is no proper spacing between the detected text lines. We demonstrate the capability of the proposed technique by conducting experiments on video images containing both graphics text and scene text with different font shapes and sizes, languages, text directions, background and contrasts. Our experimental results show that the proposed technique outperforms existing methods in terms of detection rate for large video image database.