This paper describes an augmented reality (AR) multimodal interface that uses speech and paddle gestures for interaction. The application allows users to intuitively arrange virtual furniture in a virtual room using a combination of speech and gestures from a real paddle. Unlike other multimodal AR applications, the multimodal fusion is based on the combination of time-based and semantic techniques to disambiguate a users speech and gesture input. We describe our AR multimodal interface architecture and discuss how the multimodal inputs are semantically integrated into a single interpretation by considering the input time stamps, the object properties, and the user context. CR Categories and Subject Descriptors: H.5.1 (Multimedia Information Systems): Artificial, augmented, and virtual realities; H.5.2 (User Interfaces): Auditory (non-speech) feedback, Graphical user interfaces (GUI), Interaction styles, Natural language, Voice I/O.