This paper proposes a system to relate objects in an image using occlusion cues and arrange them according to depth. The system does not rely on any a priori knowledge of the scene structure and focuses on detecting specific points, such as Tjunctions, to infer the depth relationships between objects in the scene. The system makes extensive use of the Binary Partition Tree (BPT) as the segmentation tool jointly with a new approach for T-junction estimation. Following a bottomup strategy, regions (initially individual pixels) are iteratively merged until only one region is left. At each merging step, the system estimates the probability of observing a T-junction which is a cue of occlusion when three regions meet. When the BPT is constructed and the pruning is performed, this information is used for depth ordering. Although the proposed system only relies on one low-level depth cue and does not involve any learning process, it shows similar performances than the state of the art.