This paper proposes a framework to aid video analysts in detecting suspicious activity within the tremendous amounts of video data that exists in today’s world of omnipresent surveillance video. Ideas and techniques for closing the semantic gap between low-level machine readable features of video data and high-level events seen by a human observer are discussed. An evaluation of the event classification and diction technique is presented and future an experiment to refine this technique is proposed. These experiments are used as a lead to a discussion on the most optimal machine learning algorithm to learn the event representation scheme proposed in this paper.
Gal Lavee, Latifur Khan, Bhavani M. Thuraisingham