Divide and conquer algorithms are a good match for modern parallel machines: they tend to have large amounts of inherent parallelism and they work well with caches and deep memory...
Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system...
Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K....
There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However,...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
On-line transaction processing exhibits poor memory behavior in high-end multiprocessor servers because of complex sharing patterns and substantial interaction between the databas...
Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wide-issue aggressive superscalars. In this paper, we focus on Database applicatio...