In this paper we present a simple framework for activity recognition based on a model of multi-layered finite state machines, built on top of a low level image processing module for spatio-temporal detections and limited object identification. The finite state machine network learns, in an unsupervised mode, usual patterns of activities in a scene over long periods of time. Then, in the recognition phase, usual activities are accepted as normal and deviant activity patterns are flagged as abnormal. Results, on real image sequences, demonstrate the robustness of the framework.