Sciweavers

CF
2009
ACM
14 years 7 months ago
Mapping the LU decomposition on a many-core architecture: challenges and solutions
Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedde...
Ioannis E. Venetis, Guang R. Gao
CF
2009
ACM
14 years 7 months ago
Towards automatic program partitioning
There is a trend towards using accelerators to increase performance and energy efficiency of general-purpose processors. Adoption of accelerators, however, depends on the availabi...
Sean Rul, Hans Vandierendonck, Koen De Bosschere
CF
2009
ACM
14 years 7 months ago
Scheduling dynamic parallelism on accelerators
Resource management on accelerator based systems is complicated by the disjoint nature of the main CPU and accelerator, which involves separate memory hierarhcies, different degr...
Filip Blagojevic, Costin Iancu, Katherine A. Yelic...
CF
2009
ACM
14 years 7 months ago
Quantitative analysis of sequence alignment applications on multiprocessor architectures
The exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms...
Friman Sánchez, Alex Ramírez, Mateo ...
CF
2009
ACM
14 years 7 months ago
Strategies for dynamic memory allocation in hybrid architectures
Hybrid architectures combining the strengths of generalpurpose processors with application-specific hardware accelerators can lead to a significant performance improvement. Our ...
Peter Bertels, Wim Heirman, Dirk Stroobandt
CF
2009
ACM
14 years 7 months ago
High-performance SIMT code generation in an active visual effects library
SIMT (Single-Instruction Multiple-Thread) is an emerging programming paradigm for high-performance computational accelerators, pioneered in current and next generation GPUs and hy...
Jay L. T. Cornwall, Lee W. Howes, Paul H. J. Kelly...
CF
2009
ACM
14 years 7 months ago
A light-weight fairness mechanism for chip multiprocessor memory systems
Chip Multiprocessor (CMP) memory systems suffer from the effects of destructive thread interference. This interference reduces performance predictability because it depends heavil...
Magnus Jahre, Lasse Natvig
CF
2009
ACM
14 years 7 months ago
Accelerating total variation regularization for matrix-valued images on GPUs
The advent of new matrix-valued magnetic resonance imaging modalities such as Diffusion Tensor Imaging (DTI) requires extensive computational acceleration. Computational accelera...
Maryam Moazeni, Alex A. T. Bui, Majid Sarrafzadeh
CF
2009
ACM
14 years 7 months ago
Evaluating multi-core platforms for HPC data-intensive kernels
Multi-core platforms have proven themselves able to accelerate numerous HPC applications. But programming dataintensive applications on such platforms is a hard, and not yet solve...
Alexander S. van Amesfoort, Ana Lucia Varbanescu, ...