The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
Abstract. The large and growing impact of memory hierarchies on overall system performance compels designers to investigate innovative techniques to improve memory-system efficienc...
Clustered microarchitectures are an effective organization to deal with the problem of wire delays and complexity by partitioning some of the processor resources. The organization ...
While set-associative caches incur fewer misses than directmapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are p...
Abstract— In a direct-mapped instruction cache, all instructions that have the same memory address modulo the cache size, share a common and unique cache slot. Instruction cache ...
Replacement policy, one of the key factors determining the effectiveness of a cache, becomes even more important with latest technological trends toward highly associative caches....
Hussein Al-Zoubi, Aleksandar Milenkovic, Milena Mi...
SQL extensions that allow queries to explicitly specify data quality requirements in terms of currency and consistency were proposed in an earlier paper. This paper develops a dat...
Algorithms for the sparse matrix-vector multiplication (shortly SpM×V ) are important building blocks in solvers of sparse systems of linear equations. Due to matrix sparsity, the...
Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses. Loop tiling, in conjunction with other program tran...
We propose an organization for the on-chip memory system of a chip multiprocessor, in which 16 processors share a 16MB pool of 256 L2 cache banks. The L2 cache is organized as a n...
Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhan...