This paper presents our work on automatically locating charts from document pages, which is an important stage in the chart image recognition and understanding system being developed. To achieve this, there are two sub-goals to be reached: locating figure blocks in a given document image, and building a classifier to differentiate charts from non-chart figures. For the first sub-goal, besides traditional logical block labelling, relevant text blocks such as text descriptions and labels for the candidates must be included in the located figure blocks to facilitate the interpretation processes in the following stages. For the second subgoal, we proposed a set of simple statistical features for building the classifier. We tested our system with the entire collection of scanned journal pages in the University of Washington database I. The experimental results are discussed in this paper.
W. Huang, C.-L. Tan