We propose a novel technique for estimating the number of people in a video sequence; it has the advantages of being stable even in crowded situations and needing no ground-truth data. By analyzing the geometrical relationships between image pixels and their intersection volumes in the real world quantitatively, a foreground image can be directly indicate the number of people. Because foreground detection can be done even in crowded situations, the proposed method can be applied to such situations. Also it can estimate the number of people in an a-priori manner, so it needs no groundtruth data which is necessary for existing feature-based estimating techniques. Experiments show the validity of the proposed method.