As environments become smart in accordance with advances in ubiquitous computing technology, researchers are struggling to satisfy users' diverse and sophisticated demands. The aim of the present work is to enable multiple persons in a interactive virtual environment to simultaneously and conveniently interact with virtual agents. To this end, we propose a real-time system that robustly tracks multiple persons in virtual environments and recognizes their actions through image sequences acquired from a single fixed camera. The proposed system is compromised of three components: blob extraction, object tracking, and human action recognition. Given an image, we extract blobs using the Mixture of Gaussians algorithm with a hierarchical data structure and we additionally remove shadows and highlights in order to obtain a more accurate object silhouette. We then track multiple objects using a motion-based object model and an inference graph for handling grouping and fragment problems. F...