Sciweavers

779 search results - page 118 / 156
» A Simple Program Transformation for Parallelism
Sort
View
ICS
2003
Tsinghua U.
14 years 28 days ago
Estimating cache misses and locality using stack distances
Cache behavior modeling is an important part of modern optimizing compilers. In this paper we present a method to estimate the number of cache misses, at compile time, using a mac...
Calin Cascaval, David A. Padua
IPPS
2009
IEEE
14 years 2 months ago
Singular value decomposition on GPU using CUDA
Linear algebra algorithms are fundamental to many computing applications. Modern GPUs are suited for many general purpose processing tasks and have emerged as inexpensive high per...
Sheetal Lahabar, P. J. Narayanan
ISCAS
2005
IEEE
155views Hardware» more  ISCAS 2005»
14 years 1 months ago
Hyperblock formation: a power/energy perspective for high performance VLIW architectures
— Architectures based on Very Long Instruction Word (VLIW) processors are an optimal choice in the attempt to obtain high performance levels in mobile devices. The effectiveness ...
Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi,...
ICCS
2005
Springer
14 years 1 months ago
Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization
Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hard...
Sadaf R. Alam, Jeffrey S. Vetter
ICS
1999
Tsinghua U.
14 years 7 hour ago
Nonlinear array layouts for hierarchical memory systems
Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory....
Siddhartha Chatterjee, Vibhor V. Jain, Alvin R. Le...