Document image matching is the key technique for document registration and retrieval. In this paper, a new matching algorithm based on document component block list and component block tree is proposed. Our method can effectively make use of the local information of each page block and the global information of page layout, while it is also robust to image distortion, filled-in text, and noises. This algorithm is then refined and applied to automatic data extraction of column forms. A demonstrating software package has been developed.