We propose a solution to the problem of object recognition given a continuous video sequence containing multiple views of an object. Initially, object models are acquired from images of the objects taken from different views. Recognition is achieved from the video sequences by employing a multiple hypothesis approach. Appearance similarity, and pose transition smoothness constraints are used to estimate the probability of the measurement being generated from a certain model hypothesis at each time instant. A smooth gradient direction feature that is quasi-invariant to illumination changes and noise is used to represent the appearance of object. The pose of the object at each time instant is modelled as a von Mises-Fisher distribution. Recognition is achieved by choosing the hypothesis set that has accumulated the maximum evidence at the end of the sequence. We have performed detailed experiments demonstrating the viability of the proposed approach.