Abstract This work introduces a self-supervised architecture for robust classification of moving obstacles in urban environments. Our approach presents a hierarchical scheme that relies on the stability of a subset of features given by a sensor to perform an initial robust classification based on unsupervised techniques. The obtained results are used as labels to train a set of supervised classifiers. The outcomes obtained with the second sensor can be used for higher level tasks such as segmentation or to refine the within-clusters discrimination. The proposed architecture is evaluated for a particular realization based on range and visual information which produces track-based labeling that is then employed to train supervised modules that perform instantaneous classification. Experiments show that the system is able to achieve 95% classification accuracy and to maintain the performance through on-line retraining when working conditions change. Keywords Self-supervised learning