Co-located collaborators often work over physical tabletops with rich geospatial information. Previous research shows that people use gestures and speech as they interact with artefacts on the table and communicate with one another. With the advent of large multi-touch surfaces, developers are now applying this knowledge to create appropriate technical innovations in digital table design. Yet they are limited by the difficulty of building a truly useful collaborative application from the ground up. In this paper, we circumvent this difficulty by: (a) building a multimodal speech and gesture engine around the Diamond Touch multi-user surface, and (b) wrapping existing, widely-used off-the-shelf single-user interactive spatial applications with a multimodal interface created from this engine. Through case studies of two quite different geospatial systems Google Earth and Warcraft III we show the new functionalities, feasibility and limitations of leveraging such single-user applications...