In this paper we show cache-friendly implementations of the Floyd-Warshall algorithm for the All-Pairs ShortestPath problem. We first compare the best commercial compiler optimiza...
DRAM power and energy efficiency considerations are becoming increasingly important for low-power and mobile systems. Using lower power modes provided by commodity DRAM chips redu...
The fill unit is the structure which collects blocks of instructions and combines them into multi-block segments for storage in a trace cache. In this paper, we expand the role of...
This paper presents an FPGA-friendly approach to tracking elephant flows in network traffic. Our approach, Single Step Segmented Least Recently Used (S3 -LRU) policy, is a netwo...
Parallel programs that modify shared data in a cachecoherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisiti...