Following the ecological approach to visual perception, this paper presents a framework that emphasizes the role of vision on referring actions. In particular, affordances are utilized to explain gestures variability in a multimodal human-computer interaction. Such a proposal is consistent with empirical findings obtained in different simulation studies showing how referring gestures are determined by the mutuality of information coming from the target and the set of movements available to the speaker. A prototype that follows anthropomorphic perceptual principles to analyze gestures has been developed and tested in preliminary computational validations. 1 Multimodal systems Sometimes a gesture can be better than a thousand words. It happens whenever we want to indicate visual objects for which a direct and unambiguous linguistic reference is not easily accessible. Gestures are efficient means for coping with the complexity of the visual world, a complexity that cannot be completely co...