Sciweavers

272 search results - page 27 / 55
» Code Transformations to Improve Memory Parallelism
Sort
View
ISCA
1997
IEEE
96views Hardware» more  ISCA 1997»
14 years 25 days ago
DataScalar Architectures
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory....
Doug Burger, Stefanos Kaxiras, James R. Goodman
ISCAS
2007
IEEE
92views Hardware» more  ISCAS 2007»
14 years 2 months ago
Macroblock-Level Adaptive Scan Scheme for Discrete Cosine Transform Coefficients
—Discrete Cosine Transform (DCT) has been widely used in image/video coding systems, where zigzag scan is usually employed for DCT coefficient organization. However, due to local...
Li Zhang, Wen Gao, Qiang Wang, Debin Zhao
SAC
2008
ACM
13 years 8 months ago
A self-balancing striping scheme for NAND-flash storage systems
To use multiple memory banks in parallel is a nature approach to boost the performance of flash-memory storage systems. However, realistic data-access localities unevenly load eac...
Yu-Bin Chang, Li-Pin Chang
LCPC
2009
Springer
14 years 1 months ago
Speculative Optimizations for Parallel Programs on Multicores
The advent of multicores presents a promising opportunity for exploiting fine grained parallelism present in programs. Programs parallelized in the above fashion, typically involv...
Vijay Nagarajan, Rajiv Gupta
ICPP
1998
IEEE
14 years 26 days ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang