The parallel packet switch (PPS) extends the inverse multiplexing architecture, and is extensively used as the core of contemporary commercial switches. A key factor in the performance of a PPS is its demultiplexing algorithm, responsible for dispatching cells to the middle-stage switches. This paper investigates the inherent queuing delay and delay jitter introduced by the PPS’s demultiplexing algorithm, relative to an optimal work-conserving switch. We show that the inherent queuing delay of a symmetric and fault-tolerant N ×N PPS, where every demultiplexing algorithm dispatches cells to all the middle-stage switches is Ω(N), if there are no buffers in the PPS input-ports. If the demultiplexing algorithms dispatch cells only to part of the middle-stage switches, the queuing delay and delay jitter are Ω(N/S), where S is the PPS speedup. These lower bounds hold unless the demultiplexing algorithm has full and immediate knowledge of the switch status. (The specific constants, ...