We present in this paper a formal approach for the representation of multimodal information. This approach, thanks to the to use of typed feature structures and hypergraphs, generalizes existing ones (typically annotation graphs) in several ways. It first proposes an homogenous representation of different types of information (nodes and relations) coming from different domains (speech, gestures). Second, it makes it possible to specify constraints representing the interaction between the different modalities, in the perspective of developing multimodal grammars.