Sciweavers

1001 search results - page 43 / 201
» Improving memory hierarchy performance for irregular applica...
Sort
View
PPOPP
2009
ACM
16 years 6 months ago
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
PPOPP
2010
ACM
16 years 3 months ago
Scalable communication protocols for dynamic sparse data exchange
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
ICS
1998
Tsinghua U.
15 years 10 months ago
Data Prefetching for Software DSMs
In this paper we propose and evaluate the Adaptive++ technique, a novel runtime-only data prefetching strategy for software-based distributed shared-memory systems (software DSMs)...
Ricardo Bianchini, Raquel Pinto, Claudio Luis de A...
MICRO
2006
IEEE
79views Hardware» more  MICRO 2006»
16 years 3 days ago
Fair Queuing Memory Systems
We propose and evaluate a multi-thread memory scheduler that targets high performance CMPs. The proposed memory scheduler is based on concepts originally developed for network fai...
Kyle J. Nesbit, Nidhi Aggarwal, James Laudon, Jame...
DAC
2001
ACM
16 years 7 months ago
Performance-Driven Multi-Level Clustering with Application to Hierarchical FPGA Mapping
In this paper, we study the problem of performance-driven multi-level circuit clustering with application to hierarchical FPGA designs. We first show that the performance-driven m...
Jason Cong, Michail Romesis