This paper presents a content-based approach for temporal segmentation of videos. Tracked objects are characterized by their 2D trajectories which are used in a meaningful way to model visual semantics, i.e., the observed single video object activities and their interactions. To this end, hierarchical Semi-Markov Chains (SMCs) are computed in order to take into account the temporal causalities of object motions. Object movements are characterized using local invariant features computed from the curvature and velocity values while interactions are represented by the temporal evolution of the distance between objects. We have evaluated our method on squash video sequences, and have favorably compared with other methods including Hidden Markov Models (HMMs).