The interactive game environment, Ghost in the Cave, presented in this short paper, is a work still in progress. The game involves participants in an activity using non-verbal emotional expressions. Two teams use expressive gestures in either voice or body movements to compete. Each team has an avatar controlled either by singing into a microphone or by moving in front of a video camera. Participants/players control their avatars by using acoustical or motion cues. The avatar is navigated in a 3D distributed virtual environment using the Octagon server and player system. The voice input is processed using a musical cue analysis module yielding performance variables such as tempo, sound level and articulation as well as an emotional prediction. Similarly, movements captured from a video camera are analyzed in terms of different movement cues. The target group is young teenagers and the main purpose to encourage creative expressions through new forms of collaboration.