Switching cells in parallel is a common approach to build switches with very high external line rate and a large number of ports. A prime example is the parallel packet switch (in short, PPS) in which a demultiplexing algorithm sends cells, arriving at rate R on N input-ports, through one of K intermediate slower switches, operating at rate r < R. This paper presents lower bounds on the average queuing delay introduced by the PPS relative to an optimal workconserving FCFS switch, for demultiplexing algorithms that does not have full and immediate information about the switch status. The bounds hold even if the algorithm is randomized. These lower bounds are shown to be asymptotically optimal through a new methodology for analyzing the maximal relative queuing delay; this clearly upper bounds their average relative queuing delay. The methodology is used to devise a new algorithm that relies on slightly out-dated global information on the switch status. It is also used to provide, fo...