Clos networks are an important class of switching networks due to their modular structure and much lower cost compared with crossbars. For routing I/O permutations of Clos networks, sequential routing algorithms are too slow, and all known parallel algorithms are not practical. We present the algorithm-hardware codesign of a unified fast parallel routing architecture called distributed pipeline routing (DPR) architecture for rearrangeable nonblocking and strictly nonblocking Clos networks. The DPR architecture uses a linear interconnection structure and processing elements that performs only shift and logic AND operations. We show that a DPR architecture can route any permutation in rearrangeable nonblocking and strictly nonblocking Clos networks in O( √ N) time. The same architecture can be used to carry out control of any group of connection/disconnection requests for strictly nonblocking Clos networks in O( √ N) time. Several speeding-up techniques are also presented. This arc...