ed Abstract We present two heuristic mesh-partitioning methods, both of which build on the multiple ant-colony algorithm in order to improve the quality of the mesh partitions. The...
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
Time skewing is a compile-time optimization that can provide arbitrarily high cache hit rates for a class of iterative calculations, given a sufficient number of time steps and s...
Abstract--Memory accesses are a major cause of energy consumption for embedded systems and the stack is a frequent target for data accesses. This paper presents a fully software te...
A central problem in extending the von Neumann architecture to petaflop computers with millions of hardware threads and with a shared memory is defining the memory model [Lam79,...