A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops

14 years 1 months ago

Download www.ckluk.org

In the era of multicores, many applications that tend to require substantial compute power and data crunching (aka Throughput Computing Applications) can now be run on desktop PCs. However, to achieve the best possible performance, applications need to be written in a way that exploits both parallelism and cache locality. In this paper, we propose one such approach for x86-based architectures. Our approach uses cache-oblivious techniques to divide a large problem into smaller subproblems which are mapped to different cores or threads. We then use the compiler to exploit SIMD parallelism within each subproblem. Finally, we use autotuning to pick the best parameter values throughout the optimization process. We have implemented our approach with the Intel R Compiler and the newly developed Intel R Software Autotuning Tool. Experimental results collected on a dual-socket quad-core Nehalem show that our approach achieves an average speedup of almost 20x over the best serial cases for an i...

Chi-Keung Luk, Ryan Newton, William Hasenplaugh, M

Real-time Traffic

Applied Computing | Dual-socket Quad-core Nehalem | SOFTWARE 2011 | Software Autotuning Tool | Substantial Compute Power |

claim paper

Post Info
More Details (n/a)

Added	17 May 2011
Updated	17 May 2011
Type	Journal
Year	2011
Where	SOFTWARE
Authors	Chi-Keung Luk, Ryan Newton, William Hasenplaugh, Mark Hampton, Geoff Lowney

Comments (0)

Sciweavers

A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops

Applied Computing | Dual-socket Quad-core Nehalem | SOFTWARE 2011 | Software Autotuning Tool | Substantial Compute Power |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers