This work introduces a robot driven camera controlled by speech. The SIMIS database of 20 recordings of real life surgical operations serves as basis for analyses and noise modelling. To overcome low recognition performance due to high noise levels during operations, the vocabulary was chosen to be highly limited and multiple noise reduction methods have been investigated. We show that the use of feature enhancement techniques, such as Histogram Equalization or a Switching Linear Dynamic Model capturing the dynamics of speech show a remarkable improvement in recognition accuracy. Considering a severe condition of usage of the recognition system with all appearing noise types, the mean accuracy can be raised from 89.67 %