The full-custom CMOS realization of a new modular sorting architecture is presented. The high-performance architecture is based on rank ordering, and on efficient implementation of multi-input majority (voting) functions. The overall complexity of the proposed bitserial architecture increases linearly with the number of input vectors to be sorted (window size = m) and with the bit-length of the input vectors (word size = n), and the sorter architecture can be easily expanded to accommodate large vector sets. It is shown that the proposed sorting engine is capable of producing a fully sorted output vector set in (m+n-1) clock cycles, i.e., in linear time. To demonstrate the concept, a full-custom sorting engine is realized to process 63 input vectors of 16-bits (m = 63, n = 16), using conventional 0.35 µm CMOS technology. The resulting sorter chip occupies a silicon area of 13 sqmm, operates at a clock frequency of 200 MHz, and it is capable of completing the sorting operation of 63 1...