This paper discusses the basic structures necessary for the generation of reference to objects in a visual scene. We construct a study designed to elicit naturalistic referring expressions to relatively complex objects, and find aspects of reference that have not been accounted for in work on Referring Expression Generation (REG). This includes reference to object parts, size comparisons without crisp measurements, and the use of analogies. By drawing on research in cognitive science and psycholinguistics, we begin developing the input structure and background knowledge necessary for an algorithm capable of generating the kinds of reference we observe.