Sciweavers

622 search results - page 96 / 125
» Comparing the Optimal Performance of Multiprocessor Architec...
Sort
View
HPCA
2009
IEEE
14 years 9 months ago
Design and implementation of software-managed caches for multicores with local memory
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
Sangmin Seo, Jaejin Lee, Zehra Sura
SIGSOFT
2008
ACM
14 years 9 months ago
A scalable technique for characterizing the usage of temporaries in framework-intensive Java applications
Framework-intensive applications (e.g., Web applications) heavily use temporary data structures, often resulting in performance bottlenecks. This paper presents an optimized blend...
Bruno Dufour, Barbara G. Ryder, Gary Sevitsky
CGF
2011
13 years 3 months ago
A Parallel SPH Implementation on Multi-Core CPUs
This paper presents a parallel framework for simulating fluids with the Smoothed Particle Hydrodynamics (SPH) method. For low computational costs per simulation step, efficient ...
Markus Ihmsen, Nadir Akinci, Markus Becker, Matthi...
DAC
2010
ACM
13 years 8 months ago
Instruction cache locking using temporal reuse profile
The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the per...
Yun Liang, Tulika Mitra
AFRICACRYPT
2010
Springer
14 years 3 months ago
ECC2K-130 on Cell CPUs
This paper describes an implementation of Pollard’s rho algorithm to compute the elliptic curve discrete logarithm for the Synergistic Processor Elements of the Cell Broadband En...
Joppe W. Bos, Thorsten Kleinjung, Ruben Niederhage...