The paper proposes a set of principles and a general architecture that may explain how language and meaning may originate and complexify in a group of physically grounded distributed agents. An experimental setup is introduced for concretising and validating specific mechanisms based on these principles. The setup consists of two robotic heads that watch a scene in which a robot moves around in its ecosystem. The first results from experiments showing the emergence of distinctions, of a lexicon, and of primitive syntactic structures are reported.