In this paper, a mosaicing-by-recognition technique is proposed where video mosaicing and text recognition are simultaneously and collaboratively optimized in a one-step manner. Specifically, multiple frames capturing a long text line are optimally concatenated with a guide of the text recognition framework. In this optimization process, rotation, scaling, vertical shift, and speed fluctuation, which often appear in video frames captured by hand-held cameras, are compensated. The optimization is performed by a DPbased algorithm. The results of experiments to evaluate not only the accuracy of text recognition but also that of video mosaicing indicates that the proposed technique is practical and can provide reasonable results in most cases.