Human movements are important cues for recognizing human actions, which can be captured by explicit modeling and tracking of actor or through space-time low-level features. However, relying solely on human dynamics is not enough to discriminate between actions which have similar human dynamics, such as smoking and drinking, irrespective of the modeling method. Object perception plays an important role in such cases. Conversely, human movements are indicative of type of object used for the action. These two processes of object perception and action understanding are thus not independent. Consequently, action recognition improves when human movements and object perception are used in conjunction. Therefore, we propose a probabilistic approach to simultaneously infer what action was performed, what object was used and what poses the actor went through. This joint inference framework can better discriminate between actions and objects which are too similar and lack discriminative features...
Furqan M. Khan, Vivek Kumar Singh, Ram Nevatia