Sciweavers

ACL
2015

Describing Images using Inferred Visual Dependency Representations

8 years 7 months ago
Describing Images using Inferred Visual Dependency Representations
The Visual Dependency Representation (VDR) is an explicit model of the spatial relationships between objects in an image. In this paper we present an approach to training a VDR Parsing Model without the extensive human supervision used in previous work. Our approach is to find the objects mentioned in a given description using a state-of-the-art object detector, and to use successful detections to produce training data. The description of an unseen image is produced by first predicting its VDR over automatically detected objects, and then generating the text with a template-based generation model using the predicted VDR. The performance of our approach is comparable to a state-ofthe-art multimodal deep neural network in images depicting actions.
Desmond Elliott, Arjen de Vries
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ACL
Authors Desmond Elliott, Arjen de Vries
Comments (0)