Computer-based presentation systems enable the realization of effective and dynamic presentation styles that incorporate multiple media. Obvious examples are animated user interface agents which verbally comment on multimedia objects displayed on the screen while performing cross-media and cross-window pointing gestures. The design of such presentations must account for the temporal coordination of media output and the agent's behavior. In this paper we describe a new presentation system which not only creates the multimedia objects to be presented, but also generates a script for presenting the material to the user. In our system, this script is forwarded to an animated presentation agent running the presentation. The paper details the kernel of the system which is a component for planning temporally coordinated multimedia. Figure 1: Verbal Annotation of Graphical Objects