To support context-based multimodal interpretation in conversational systems, we have developed a semantics-based representation to capture salient information from user inputs and the overall conversation. In particular, we present three unique characteristics: finegrained semantic models, flexible composition of feature structures, and consistent representation at multiple levels. This representation allows our system to use rich contexts to resolve ambiguities, infer unspecified information, and improve multimodal alignment. As a result, our system is able to enhance understanding of multimodal inputs including those abbreviated, imprecise, or complex ones.
Joyce Y. Chai