This paper presents a multimodal crisis management system (XISM). It employs processing of natural gesture and speech commands elicited by a user to efficiently manage complex dynamic emergency scenarios on a large display. The developed prototype system demonstrates the means of incorporating unconstrained free-hand gestures and speech in a real-time interactive interface. This paper provides insights into the design aspects of the XISM system. In particular, it addresses the issues of extraction and fusion of gesture and speech modalities to allow more natural interactive behavior. Performance characteristics of the current prototype and considerations for future work are discussed. A series of studies indicated positive response with respect to ease of interacting with the current system.