Scene text images feature an abundance of font style variety but a dearth of data in any given query. Recognition methods must be robust to this variety or adapt to the query data’s characteristics. To achieve this, we augment a semi-Markov model—integrating character segmentation and recognition—with a bigram model of character widths. Softly promoting segmentations that exhibit font metrics consistent with those learned from examples, we use the limited information available while avoiding error-prone direct estimates and hard constraints. Incorporating character width bigrams in this fashion improves recognition on low-resolution images of signs containing text in many fonts.