Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible costeffective enhancement to high-performance systems. To date, mo...
Samer Al-Kiswany, Abdullah Gharaibeh, Elizeu Santo...
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to the outer loops. In a companion paper, we proposed a ...
Hongbo Rong, Alban Douillet, Ramaswamy Govindaraja...
Many of the recently proposed techniques to reduce power consumption in caches introduce an additional level of nondeterminism in cache access latency. Due to this additional late...
Soontae Kim, Narayanan Vijaykrishnan, Mary Jane Ir...
In this paper we present a parallel implementation of T–Coffee — a widely used multiple sequence alignment package. Our software supports a majority of options provided by the...
Jaroslaw Zola, Xiao Yang, Adrian Rospondek, Sriniv...
Because they are based on large content-addressable memories, load-store queues (LSQ) present implementation challenges in superscalar processors, especially as issue width and nu...