This paper investigates the protocol overhead pipelining between the host and network interface card (NIC). Existing researches into the protocol overhead pipelining assume that protocol overheads in the host and NIC can be naturally pipelined. Our architecture-aware investigation, however, finds a new fact that the host and NIC compete against each other to access the host memory, system bus, and I/O bus, so that the overhead pipelining is seriously hindered, which leads to a sub-optimal performance. We suggest several methods to avoid such competitions for the hardware resources and implement a pipelining UDP named π-UDP on Myrinet. As a result, π-UDP achieves over 97% of the theoretical maximum throughput of Myrinet.