Abstract We present a method for extracting geometric and relational structures from raw intensity data. On one hand, low-level image processing extracts isolated features. On the other hand, image interpretation uses sophisticated object descriptions in representation frameworks such as semantic networks. We suggest an intermediate-level description between low- and high-level vision. cription is produced by grouping image features into more and more abstract structures. First, we motivate our choice with respect to what should be represented and we stress the limitations inherent with the use of sensory data. Second, we describe our current implementation and illustrate it with various examples.