Sciweavers

733 search results - page 127 / 147
» High performance in tree-based parallel architectures
Sort
View
PPOPP
2009
ACM
14 years 9 months ago
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
ISPASS
2007
IEEE
14 years 3 months ago
Simplifying Active Memory Clusters by Leveraging Directory Protocol Threads
Address re-mapping techniques in so-called active memory systems have been shown to dramatically increase the performance of applications with poor cache and/or communication beha...
Dhiraj D. Kalamkar, Mainak Chaudhuri, Mark Heinric...
VRIPHYS
2010
13 years 3 months ago
Asynchronous Preconditioners for Efficient Solving of Non-linear Deformations
In this paper, we present a set of methods to improve numerical solvers, as used in real-time non-linear deformable models based on implicit integration schemes. The proposed appr...
Hadrien Courtecuisse, Jérémie Allard...
HPCA
2009
IEEE
14 years 9 months ago
Express Cube Topologies for on-Chip Interconnects
Driven by continuing scaling of Moore's law, chip multiprocessors and systems-on-a-chip are expected to grow the core count from dozens today to hundreds in the near future. ...
Boris Grot, Joel Hestness, Stephen W. Keckler, Onu...
HPCA
2009
IEEE
14 years 9 months ago
Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy
Cache hierarchies in future many-core processors are expected to grow in size and contribute a large fraction of overall processor power and performance. In this paper, we postula...
Niti Madan, Li Zhao, Naveen Muralimanohar, Anirudd...