A scheme, named tower of knowledge (ToK), is proposed for interpreting 3D scenes. The ToK encapsulates causal dependencies between object appearance and functionality. We demonstrate it by labelling the components of the 3D model of a building, reconstructed from images of multiple views, by using utility theory.