We present the architecture and practical VLSI implementation of a 4-Tb/s single-stage switch. It is based on a combined input- and crosspoint-queued structure with virtual output queuing at the ingress, which has the scalability of input-buffered switches and the performance of output-buffered switches. Our system handles the large fabric-internal transmission latency that results from packaging up to 256 line cards into multiple racks. We provide the justification for selecting this architecture and compare it with other current solutions. With an ASIC implementation, we show that a single-stage multi-terabit buffered crossbar approach is viable today.
François Abel, Cyriel Minkenberg, Ronald P.