This paper investigates the integration of verbal and visual information for describing (explaining) the content of images formed by threedimensional geometrical gures, from a hybrid neurosymbolic perspective. The results of visual object classications involving topdown application of stored knowledge and bottomup image processing are effectively explained relying on both words and pictures. The latter seems particularly suitable in explanations concerning highlevel visual tasks involving both topdown reasoning and bottomup perceptual processes.