Abstract. This paper presents a novel descriptor for human detection in video sequence. It is referred to as spatial-temporal granularity -tunable gradients partition (STGGP), which is an extension of granularity-tunable gradients partition (GGP) from the still image domain to the spatial-temporal domain. Specifically, the moving human body is considered as a 3-dimensional entity in the spatial-temporal domain. Then in 3D Hough space, we define the generalized plane as a primitive to parse the structure of this 3D entity. The advantage of the generalized plane is that it can tolerate imperfect planes with certain level of uncertainty in rotation and translation. The robustness to the uncertainty is controlled quantitatively by the granularity parameters defined explicitly in the generalized plane. This property endows the STGGP descriptors versatile ability to represent both the deterministic structures and the statistical summarizations of the object. Moreover, the STGGP descriptor en...