In video surveillance, automatic methods for scene understanding and activity modeling can exploit the high redundancy of object trajectories observed over a long period of time. The goal of scene understanding is to generate a semantic model of the scene describing the patterns of normal activities. We are proposing to boost the performances of a real time object tracker in terms of object classification based on the accumulation of statistics over time. Based on the object shape, an initial three class object classification (Vehicle, Pedestrian and Other) is performed by the tracker. This initial labeling is usually very noisy because of object occlusions/merging and the eventual presence of shadows. The proposed scene activity modeling approach is derived from Makris and Ellis algorithm where the scene is described in terms of clusters of similar trajectories (called routes). The original envelope based model is replaced by a simpler statistical model around each route's node....