In this paper, we consider a vision-based system that can interpret a user's gestures in real time to manipulate windows and objects within a graphical user interface. A hand segmentation procedure first extracts binary hand blob(s) from each frame of the acquired image sequence. Fourier descriptors are used to represent the shape of the hand blobs, and are input to radial-basis function (RBF) network(s) for pose classification. The pose likelihood vector from the RBF network output is used as input to the gesture recognizer, along with motion information. Gesture recognition performances using hidden Markov models (HMM) and recurrent neural networks (RNN) were investigated. Test results showed that the continuous HMM yielded the best performance with gesture recognition rates of 90.2%. Experiments with combining the continuous HMMs and RNNs revealed that a linear combination of the two classifiers improved the