In human face-to-face interaction, participants can rely on a number of audio-visual information for interpreting interlocutors' communicative intentions, such information strongly contributing to the successfulness of communication. Modelling these typical human abilities represents a main objective in human communication research, including technological applications like human-machine interaction. In this pilot study we explore the possibility of using audio-visual parameters for describing/measuring the differences perceived in interlocutor's communicative behaviours. Preliminary results derived from the multimodal analysis of a single subject seem to indicate that measuring the distribution of some prosodic and hand gesture events which are temporally co-occurring contribute to the accounting of such perceived differences. Moreover, as far as gesture events are concerned, it has been observed that relevant information are not simply to be found in the occurences of sing...