GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves inst...
Memory interleaving is a cost-efficient approach to increase bandwidth. Improving data access locality and reducing memory access conflicts are two important aspects to achieve hi...
This paper proposes a theoretical framework for verifying and deriving code optimizations for programs written in parallel programming languages. The key idea of this framework is...
Cross-layer optimization aims at improving the performance of network users operating in a time-varying, error-prone wireless environment. However, current solutions often rely on...