A hierarchical algorithm is presented for determining the similarity and equivalence of document images. Features extracted from the CCIIT fax-compressed representations of two images are compared to determine their visual similarity and whether they are equivalent. Passcodes in the compressed data are used as features. A fixed grid is imposed on the image and a feature vector is derived from the number of pass codes in each grid cell. The features vectors are compared to locate a group of documents that are visually similar to the input image. The equivalence of two documents is determined by applying the Hausdorff distance to the two-dimensional arrangement of passcodes in small patches of each image.
Jonathan J. Hull, John F. Cullen