We discuss problems in developing policies for ground truthing document images for pixel-accurate segmentation. First, we describe ground truthing policies that apply to four different scales: (1) paragraph, (2) text line, (3) character, and (4) pixel. We then analyze difficult and/or ambiguous cases that will challenge any policy, e.g. blank space, overlapping content, etc. Experiments have shown the benefit of using "tighter" zones that capture more detail (e.g., at the text line level, instead of paragraph). We show that tighter ground truth does significantly improve classification results, by 45% in recent experiments. It is important to face the fact that a pixel-accurate segmentation can be better than manually obtained ground truth. In practice, perfectly accurate pixel-level ground truth may not be achievable of course, but we believe it is important to explore methods to semi-automatically improve existing ground truth.
Michael A. Moll, Henry S. Baird, Chang An