Segmentation and Normalisation in Grapheme Codebooks

14 years 6 months ago

Download www.icdar2011.org

Abstract—The grapheme codebook is a high-performing technique for ofﬂine writer identiﬁcation. This paper considers whether the de facto standards for initial grapheme extraction are optimal for both modern and historical datasets. We examine the construction and representation of the graphemes that comprise the codebook, testing three segmentation methods and two grapheme size normalisation methods on two datasets: a 93-writer IAM dataset, and a 43-writer medieval English dataset. The standard minima-split segmentation is compared to a complementary segmentation method that preserves ligature shapes, as well as the union of both these methods. Classiﬁcation performance for each method is compared on a range of codebook sizes. We demonstrate that grapheme aspect-ratio is not always a writerspeciﬁc feature, and that preserving the character body shape in segmentation is more informative than preserving cursive text ligatures.

Tara Gilliam, Richard C. Wilson, John A. Clark

Real-time Traffic

Document Analysis | Graphemes | ICDAR 2011 | Segmentation Method | Segmentation Methods |

claim paper

Post Info
More Details (n/a)

Added	24 Dec 2011
Updated	24 Dec 2011
Type	Journal
Year	2011
Where	ICDAR
Authors	Tara Gilliam, Richard C. Wilson, John A. Clark

Comments (0)

Sciweavers

Segmentation and Normalisation in Grapheme Codebooks

Document Analysis | Graphemes | ICDAR 2011 | Segmentation Method | Segmentation Methods |

Explore & Download

Productivity Tools

Sciweavers