Abstract. Many-core processor architectures require scalable solutions that reflect the locality and power constraints of future generations of technology. This paper presents a CM...
The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software c...
Joseph Gebis, Leonid Oliker, John Shalf, Samuel Wi...
Abstract. In this paper we present a partial bitstreams ultra-fast downloading process through a standard Ethernet network. These Virtex-based and partially reconfigurable systems...
Pierre Bomel, Jeremie Crenne, Linfeng Ye, Jean-Phi...
Resource management has been an area of research in ad hoc grids for many years. Recently, different research projects have focused resource management in centralized, decentralize...
Abstract. Java is widely deployed on a variety of processor architectures. Consequently, an understanding of microarchitecture level Java performance is critical to optimize curren...
In sampling based hotspot detection, performance engineers sample the running program periodically and record the Instruction Pointer (IP) addresses at the sampling. Empirically, f...
In previous work the 3D-Wave parallelization strategy was proposed to increase the parallel scalability of H.264 video decoding. This strategy is based on the observation that inte...
Arnaldo Azevedo, Cor Meenderinck, Ben H. H. Juurli...
Abstract. Threads experiencing long-latency loads on a simultaneous multithreading (SMT) processor may clog shared processor resources without making forward progress, thereby star...
Kenzo Van Craeynest, Stijn Eyerman, Lieven Eeckhou...
Abstract. Iterative compilation is an efficient approach to optimize programs on rapidly evolving hardware, but it is still only scarcely used in practice due to a necessity to gat...