In this paper, we describe a new design of a recognition system for a single image of indoor scene including complex occlusions. In our system, rst, the system estimates 3D structure of an object by tting a 3D structure model to the image qualitatively. Next, by checking supporting relation between objects, it eliminates object candidates that are impossible to exist and estimates actual objects from their parts in the image. Then, nally, we recognize objects that are consistent with each other. We implemented the system as a multi-agent-based image understanding system. This paper describes an outline of the system and results of experiments.