Sciweavers

272 search results - page 12 / 55
» Code Transformations to Improve Memory Parallelism
Sort
View
EUROPAR
2010
Springer
13 years 8 months ago
Optimized On-Chip-Pipelined Mergesort on the Cell/B.E
Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...
Rikard Hultén, Christoph W. Kessler, Jö...
IEEEPACT
2009
IEEE
14 years 3 months ago
Architecture Support for Improving Bulk Memory Copying and Initialization Performance
—Bulk memory copying and initialization is one of the most ubiquitous operations performed in current computer systems by both user applications and Operating Systems. While many...
Xiaowei Jiang, Yan Solihin, Li Zhao, Ravishankar I...
CLUSTER
2003
IEEE
14 years 1 months ago
Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost
The MPI Standard supports derived datatypes, which allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. Thi...
Surendra Byna, William D. Gropp, Xian-He Sun, Raje...
SIGARCH
2008
94views more  SIGARCH 2008»
13 years 8 months ago
Parallelization, performance analysis, and algorithm consideration of Hough transform on chip multiprocessors
This paper presents a parallelization framework for emerging applications on the future chip multiprocessors (CMPs). With the continuing prevalence of CMP and the number of on-die...
Wenlong Li, Yen-Kuang Chen
HPCA
2012
IEEE
12 years 4 months ago
Improving write operations in MLC phase change memory
Phase change memory (PCM) recently has emerged as a promising technology to meet the fast growing demand for large capacity memory in modern computer systems. In particular, multi...
Lei Jiang, Bo Zhao, Youtao Zhang, Jun Yang 0002, B...