Modern processors can issue and execute multiple instructions per cycle, often performing multiple memory operations simultaneously. To reduce stalls due to resource conflicts, most processors employ multi-ported L1 caches and TLBs to enable concurrent memory accesses. In this paper, we observe that data TLB lookups within a cycle and across consecutive cycles are often synonymous — they go to the same page. To exploit this finding, we propose two new mechanisms — intra-cycle compaction and inter-cycle compaction of address translation requests in order to save energy in the data TLB. Our results show that average energy savings of 27% using intra-cycle, 42% using inter-cycle in a conventional d-TLB, and 56% using inter-cycle compaction in semantic-aware d-TLBs can be achieved. When these 2 compaction techniques are combined together and applied to both the i-TLB and semantic-aware d-TLBs, an average energy savings of 76% (up to 87%) is obtained. Categories and Subject Descripto...
Chinnakrishnan S. Ballapuram, Hsien-Hsin S. Lee, M