This paper describes our work in usage pattern analysis and development of a latent semantic analysis framework for interpreting multimodal user input consisting speech and pen gestures. We have designed and collected a multimodal corpus of navigational inquiries. Each modality carries semantics related to domain-specific task goal. Each inquiry is annotated manually with a task goal based on the semantics. Multimodal input usually has a simpler syntactic structure than unimodal input and the order of semantic constituents is different in multimodal and unimodal inputs. Therefore, we proposed to use semantic analysis to derive the latent semantics from the multimodal inputs using latent semantic modeling (LSM). In order to achieve this, we parse the recognized Chinese spoken input for the spoken locative references (SLR). These SLRs are then aligned with their corresponding pen gesture(s). Then, we characterized the cross-modal integration pattern as 3tuple multimodal terms with SLR, ...
Pui-Yu Hui, Wai Kit Lo, Helen M. Meng