Abstract. In this paper, we propose a processor architecture with programmable on-chip memory for a high-performance SMP (symmetric multi-processor) node named SCIMA-SMP (Software ...
This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
Chip Multiprocessor (CMP) memory systems suffer from the effects of destructive thread interference. This interference reduces performance predictability because it depends heavil...
Data-Pipelining is a widely used model to represent streaming applications. Incremental decomposition and optimization of a data-pipelining application onto a multi-processor plat...
Bo-Cheng Charles Lai, Patrick Schaumont, Wei Qin, ...
Memory system bottlenecks limit performance for many applications, and computations with strided access patterns are among the hardest hit. The streams used in such applications h...