We propose a distributed congestion management scheme for non-blocking, 3-stage Clos networks, comprising plain buffered crossbar switches. VOQ requests are routed using multipath routing to the switching elements of the 3rdstage, and grants travel back to the linecards the other way around. The fabric elements contain independent singleresource schedulers, that serve requests and grants in a pipeline. As any other network with limited capacity, this scheduling network may suffer from oversubscribed links, hotspot contention, etc., which we identify and tackle. We also reduce the cost of internal buffers, by reducing the data RTT, and by allowing sub-RTT crosspoint buffers. Performance simulations demonstrate that, with almost all outputs congested, packets destined to non-congested outputs experience very low delays (flow isolation). For applications requiring very low communication delays, we propose a second, parallel operation mode, wherein linecards can forward a few packets eage...