Hard-to-predict branches depending on longlatency cache-misses have been recognized as a major performance obstacle for modern microprocessors. With the widening speed gap between...
Hongliang Gao, Yi Ma, Martin Dimitrov, Huiyang Zho...
Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlappi...
Estimating the maximum power and thermal characteristics of a processor is essential for designing its power delivery system, packaging, cooling, and power/thermal management sche...
Ajay M. Joshi, Lieven Eeckhout, Lizy Kurian John, ...
—Bulk memory copying and initialization is one of the most ubiquitous operations performed in current computer systems by both user applications and Operating Systems. While many...
Xiaowei Jiang, Yan Solihin, Li Zhao, Ravishankar I...
Background: BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled &...