Sciweavers

1001 search results - page 42 / 201
» Improving memory hierarchy performance for irregular applica...
Sort
View
HPCC
2005
Springer
15 years 11 months ago
Memory Subsystem Characterization in a 16-Core Snoop-Based Chip-Multiprocessor Architecture
In this paper we present an exhaustive evaluation of the memory subsystem in a chip-multiprocessor (CMP) architecture composed of 16 cores. The characterization is performed making...
Francisco J. Villa, Manuel E. Acacio, José ...
LCPC
2009
Springer
15 years 10 months ago
A Balanced Approach to Application Performance Tuning
Abstract. Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a pana...
Souad Koliai, Stéphane Zuckerman, Emmanuel ...
IEEEPACT
1999
IEEE
15 years 10 months ago
On Reducing False Sharing while Improving Locality on Shared Memory Multiprocessors
The performance of applications on large shared-memory multiprocessors with coherent caches depends on the interaction between the granularity of data sharing, the size of the coh...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
ASPLOS
2008
ACM
15 years 8 months ago
Optimistic parallelism benefits from data partitioning
Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism ar...
Milind Kulkarni, Keshav Pingali, Ganesh Ramanaraya...
ARVLSI
1997
IEEE
151views VLSI» more  ARVLSI 1997»
15 years 9 months ago
The Hierarchical Multi-Bank DRAM: A High-Performance Architecture for Memory Integrated with Processors
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However, a...
Tadaaki Yamauchi, Lance Hammond, Kunle Olukotun