Nowadays various are the situations in which people need to interact with a Personal Computer without having the possibility to use traditional pointing devices, such as a keyboard or a mouse. In the latest years, various alternatives to the classical input devices like keyboard and mouse and novel interaction paradigms have been proposed. Particularly, multimodal interactions have been proposed to overcome the limit of each input channel take alone. In this paper we propose a multimodal system based on the integration of speechand gaze- based inputs for interaction with a real desktop environment. A real-time grammar is generated to limit the vocal vocabulary basing on the fixated area. A disambiguation method is used for inherently ambiguous vocal commands, and the performed tests show its efficiency. CR Categories: H.5.2 [Information Systems]: Information Interfaces and presentation--User Interfaces; H.4 [Information Systems]: Information System Applications--;