It is an important problem to map virtual parallel processes to physical processors (or cores) in an optimized way to get scalable performance due to non-uniform communication cost...
Jin Zhang, Jidong Zhai, Wenguang Chen, Weimin Zhen...
This paper examines the scalable parallel implementation of QR factorization of a general matrix, targeting SMP and multi-core architectures. Two implementations of algorithms-by-...
Modern microprocessor technology is migrating from simply increasing clock speeds on a single processor to placing multiple processors on a die to increase throughput and power pe...
Bhavana B. Manjunath, Aaron S. Williams, Chaitali ...
Providing QoS and performance guarantees to arbitrarily divisible loads has become a significant problem for many cluster-based research computing facilities. While progress is b...
Xuan Lin, Ying Lu, Jitender S. Deogun, Steve Godda...
Compiler technology for multimedia extensions must effectively utilize not only the SIMD compute engines but also the various levels of the memory hierarchy: superword registers,...
Chun Chen, Jaewook Shin, Shiva Kintali, Jacqueline...