Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...
stractions are extensively used to understand and solve challenging computational problems in various scientific and engineering domains. They have particularly gained prominence...
The ability to quickly predict the throughput of a TCP transfer between a client and a server, or between peers, has wide application in scientific computing and commercial compu...
Abstract. We introduce a collection of high performance kernels for basic linear algebra. The kernels encapsulate small xed size computations in order to provide building blocks fo...
Abstract. With more cores integrated into one single chip, the overall power consumption from the multiple concurrent running programs increases dramatically in a CMP processor whi...