Bulk memory copies incur large overheads such as CPU stalling (i.e., no overlap of computation with memory copy operation), small register-size data movement, cache pollution, etc...
Karthikeyan Vaidyanathan, Lei Chai, Wei Huang, Dha...
Parallel and concurrent garbage collectors are increasingly employed by managed runtime environments (MREs) to maintain scalability, as multi-core architectures and multi-threaded...
Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-mem...
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay ...
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...
The performance of a concurrent multithreaded architectural model, called superthreading 15 , is studied in this paper. It tries to integrate optimizing compilation techniques and...
Jenn-Yuan Tsai, Zhenzhen Jiang, Eric Ness, Pen-Chu...