When multiple processor (CPU) cores and a GPU integrated together on the same chip share the off-chip main memory, requests from the GPU can heavily interfere with requests from t...
Rachata Ausavarungnirun, Kevin Kai-Wei Chang, Lava...
We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a...
Scheduling query execution plans is a particularly complex problem in hierarchical parallel systems, where each site consists of a collection of local time-shared (e.g., CPU(s) or...
Abstract. Blockwise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is f...
Frank K. H. A. Dehne, Wolfgang Dittrich, David A. ...
Abstract. By simulating a real computer it is possible to gain a detailed knowledge of the cache memory utilization of an application, e.g., a partial differential equation (PDE) s...