This paper proposes a model-based methodology for recognizing and tracking objects in digital image sequences. Objects are represented by attributed relational graphs (or ARGs), which carry both local and relational information about them. The recognition is performed by inexact graph matching, which consists in finding an approximate homomorphism between ARGs derived from an input video and a model image. Searching for a suitable homomorphism is achieved through a tree-search optimization algorithm and the minimization of a pre-defined cost function. Motion smoothness between successive frames is exploited to achieve the recognition over the whole sequence, with improved spatio-temporal coherence.