With the advent of increasingly complex hardware in realtime embedded systems (processors with performance enhancing features such as pipelines, cache hierarchy, multiple cores), ...
This paper describes a novel approach to generate an optimized schedule to run threads on distributed shared memory (DSM) systems. The approach relies upon a binary instrumentatio...
Although event tracing of parallel applications offers highly detailed performance information, tracing on current leading edge systems may lead to unacceptable perturbation of the...
In many computer systems with large data computations, the delay of memory access is one of the major performance bottlenecks. In this paper, we propose an enhanced field remappi...
Keoncheol Shin, Jungeun Kim, Seonggun Kim, Hwansoo...
Thread migration moves a single call-stack to another machine to improve either load balancing or locality. Current approaches for checkpointing and thread migration are either no...