A critical problem in wide-issue superscalar processors is the limit on cycle time imposed by the central register file and operand bypass network. In this paper, a distributed register file architecture that employs fully distributed functional unit clusters is presented. It utilizes a local register mapping table and a dedicated register transfer network to support distributed register operations. In addition, an eager transfer mechanism is developed to reduce penalties caused by incomplete operand transport interconnection. Distributed register files can be employed to reduce operand access time by a factor of two with associated average IPC penalties of 14% and 21% on 4- and 8-way superscalar architectures across a broad range of symbolic, scientific, and multimedia applications. The IPC penalties are only 3% and 10% for SpecINT 2000 applications.
Santithorn Bunchua, D. Scott Wills, Linda M. Wills