This paper presents a novel people detection and tracking method based on a multi-modal sensor fusion approach that utilizes 2D laser range and camera data. The data points in the laser scans are clustered using a novel graph-based method and an SVM based version of the cascaded AdaBoost classifier is trained with a set of geometrical features of these clusters. In the detection phase, the classified laser data is projected into the camera image to define a region of interest for the vision-based people detector. This detector is a fast version of the Implicit Shape Model (ISM) that learns an appearance codebook of local SIFT descriptors from a set of hand-labeled images of pedestrians and uses them in a voting scheme to vote for centers of detected people. The extension consists in a fast and detailed analysis of the spatial distribution of voters per detected person. Each detected person is tracked using a greedy data association method and multiple Extended Kalman Filters that use ...