Achieving high performance on today’s architectures requires careful orchestration of many optimization parameters. In particular, the presence of shared-caches on multicore arch...
Numerical simulations in computational physics, biology, and finance, often require the use of high quality and efficient parallel random number generators. We design and optimi...
David A. Bader, Aparna Chandramowlishwaran, Virat ...
Abstract. Parallel loops account for the greatest percentage of program parallelism. The degree to which parallelism can be exploited and the amount of overhead involved during par...
Arun Kejariwal, Paolo D'Alberto, Alexandru Nicolau...
We propose Instruction-based Prediction as a means to optimize directory-based cache coherent NUMA shared-memory. Instruction-based prediction is based on observing the behavior o...
– This paper addresses the problem of ordering and sizing parallel wires in a single metal layer within an interconnect channel of a given width, such that crosscapacitances are ...