Code generation for embedded processors creates opportunities for several performance optimizations not applicable for traditional compilers. We present techniques for improving d...
Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nico...
Abstract— In this paper, we describe the design and implementation of the primary memory system of the TRIPS processor. To match the aggressive execution bandwidth and support hi...
Simha Sethumadhavan, Robert G. McDonald, Rajagopal...
The growing influence of wire delay in cache design has meant that access latencies to last-level cache banks are no longer constant. Non-Uniform Cache Architectures (NUCAs) have ...
The increasing use of microprocessor cores in embedded systems as well as mobile and portable devices creates an opportunity for customizing the cache subsystem for improved perfo...
Memory interleaving is a cost-efficient approach to increase bandwidth. Improving data access locality and reducing memory access conflicts are two important aspects to achieve hi...