The accurate recognition of text that appears in images/videos using analytical character recognition methods is often very difficult, despite the fact that the text might be correctly localized, segmented and binarized. This is mainly due to changing features of the text such as various fonts, or noise factors embedded in the image which are inherited from the complex background. In this paper, we treat the problem of comparing text images for content-based retrieval purposes, by presenting a holistic approach to this issue. First, the shape of text is represented by estimating the salient points in the text image. Then, alignment shape methods are used to establish the correspondence of the salient points. Finally, a measure is suggested to compute the dissimilarity between two text images based on the generated correspondence. Empirical evaluation of the proposed holistic comparison method has demonstrated its very good performance.