Sciweavers

271 search results - page 44 / 55
» Bandwidth-Friendly Cache Hierarchy
Sort
View
HPCC
2007
Springer
14 years 4 months ago
A Block JRS Algorithm for Highly Parallel Computation of SVDs
This paper presents a new algorithm for computing the singular value decomposition (SVD) on multilevel memory hierarchy architectures. This algorithm is based on one-sided JRS iter...
Mostafa I. Soliman, Sanguthevar Rajasekaran, Reda ...
CODES
2003
IEEE
14 years 3 months ago
A low-cost memory architecture with NAND XIP for mobile embedded systems
NAND flash memory has become an indispensable component in mobile embedded systems because of its versatile features such as non-volatility, solid-state reliability, low cost and ...
Chanik Park, Jaeyu Seo, Sunghwan Bae, Hyojun Kim, ...
PDP
2010
IEEE
14 years 2 months ago
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
The increasing numbers of cores, shared caches and memory nodes within machines introduces a complex hardware topology. High-performance computing applications now have to carefull...
François Broquedis, Jérôme Cle...
IPPS
1999
IEEE
14 years 2 months ago
Cashmere-VLM: Remote Memory Paging for Software Distributed Shared Memory
Software distributed shared memory (DSM) systems have successfully provided the illusion of shared memory on distributed memory machines. However, most software DSM systems use th...
Sandhya Dwarkadas, Robert Stets, Nikos Hardavellas...
EUROPAR
2006
Springer
14 years 1 months ago
Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...