Router microarchitecture plays a central role in the performance of an on-chip network (NoC). Buffers are needed in routers to house incoming flits which cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, we propose a new router design that aims to emulate an OBR practically, based on a distributed shared-buffer (DSB) router architecture. We introduce innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow-control. We also present practical DSB ...