Multimodal interfaces require effective parsing and nn(lerstanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies lbr multimodal integration are stated declaratively using a unification-based grammar that is used by a mnltidilnensional chart parser to compose inputs. This approach is highly expressive and supports a broad class of interfaces, but offers only limited potential for lnutual compensation among the input modes, is subject to signilicant concerns in terms o1'COml)utational complexity, and complicates selection among alternative multimodal interpretations of the input. In tiffs papeh we l)resent an alternative approacla in which multimodal lmrsing and understanding are achieved using a weighted finite-state device which takes speech and gesture streams as inputs and outputs their joint interpretation. This approach is significantly more efficienl, enables tight-coupling of multimodal underst...