Abstract. In this paper we address the problem of comparing multimedia documents, which can be described according to different reference models. If we consider presentations as collections of media items and constraints among them, expressed according to their reference model, they must be translated to a common formalism in order to compare their temporal behavior and detect if they have a common component (i.e., intersection), if one of them is included in another one (i.e., inclusion), or if they have the same temporal evolution along time (i.e., equivalence). In this paper, we propose the use of automata, to describe the temporal evolution of a document, and the SMIL language as a case study, since this standard allows to describe the same behavior with different sets of tags. In case of behaviorally equivalent SMIL documents, we propose an algorithm to extract the canonical form that represents this behavior.