We present a novel framework for easy creation of interactive, platform-independent voice-services with an animated 3D talking-head interface, on mobile phones. The framework supports automated multi-modal interaction using speech and 3D graphics. We address the difficulty of synchronizing the audio stream to the animation and discuss alternatives for distributed network control of the animation and application logic. We document the ability of modern mobile devices to handle such applications and show that the power consumption trade-off of rendering on the mobile phone versus streaming from the server favors the phone. The presented tools will empower developers and researchers in future research and usability studies in the area of mobile talking-head applications. These may be used for example in entertainment, commerce, health care or education. Categories and Subject Descriptors H.5.2 [User Interfaces]: Natural language, Voice I/O; I.3 [Three-Dimensional Graphics and Realism]: ...