In this paper, an approach to detection of caption text in video frames is described. Text recognition in video can be applied to various applications, however there are still problematic issues such as insufficient resolution, complexity of layouts and backgrounds. This study attempts to solve these problems with a segmentation-free approach, called MAP matching method. Besides extending the method to grayscale images, a strategy for character size variation using Gaussian filtering and multi-sized reference patterns is discussed, as well as a method for detecting frames containing caption text. Results show the proposed matching method is able to detect characters of unknown size in caption text. Although over-detection is not negligible, verifying the positions of detected characters can identify the location of keywords with practical precision. It is also shown that the frames containing caption text are detected with nearly 98% accuracy.