In this paper, we aim to detect human in video over large viewpoint changes which is very challenging due to the diversity of human appearance and motion from a wide spread of viewpoint domain compared with a common frontal viewpoint. We propose 1) a new feature called Intra-frame and Inter-frame Comparison Feature to combine both appearance and motion information, 2) an Enhanced Multiple Clusters Boost algorithm to co-cluster the samples of various viewpoints and discriminative features automatically and 3) a Multiple Video Sampling strategy to make the approach robust to human motion and frame rate changes. Due to the large amount of samples and features, we propose a two-stage tree structure detector, using only appearance in the 1st stage and both appearance and motion in the 2nd stage. Our approach is evaluated on some challenging Real-world scenes, PETS2007 dataset, ETHZ dataset and our own collected videos, which demonstrate the effectiveness and efficiency of our approach.