This paper proposes a generic method for action recognition
in uncontrolled videos. The idea is to use images
collected from the Web to learn representations of actions
and use this knowledge to automatically annotate actions
in videos. Our approach is unsupervised in the sense that it
requires no human intervention other than the text querying.
Its benefits are two-fold: 1) we can improve retrieval of action
images, and 2) we can collect a large generic database
of action poses, which can then be used in tagging videos.
We present experimental evidence that using action images
collected from the Web, annotating actions is possible.
Nazli Ikizler-Cinbis, R. Gokberk Cinbis, Stan Scla