Sciweavers

110 search results - page 20 / 22
» Exploiting Vector Parallelism in Software Pipelined Loops
Sort
View
SOFTWARE
2011
13 years 2 months ago
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops
In the era of multicores, many applications that tend to require substantial compute power and data crunching (aka Throughput Computing Applications) can now be run on desktop PCs...
Chi-Keung Luk, Ryan Newton, William Hasenplaugh, M...
CASES
2008
ACM
13 years 9 months ago
SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip
fficient Programming Abstractions for Heterogeneous Multicore Systems on Chip Alastair D. Reid Krisztian Flautner Edmund Grimley-Evans ARM Ltd Yuan Lin University of Michigan The ...
Alastair D. Reid, Krisztián Flautner, Edmun...
ARCS
2008
Springer
13 years 9 months ago
An Optimized ZGEMM Implementation for the Cell BE
: The architecture of the IBM Cell BE processor represents a new approach for designing CPUs. The fast execution of legacy software has to stand back in order to achieve very high ...
Timo Schneider, Torsten Hoefler, Simon Wunderlich,...
ISPASS
2007
IEEE
14 years 1 months ago
Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications
—Although SIMD extensions are a cost effective way to exploit the data level parallelism present in most media applications, we will show that they had have a very limited memory...
Mauricio Alvarez, Esther Salamí, Alex Ram&i...
CF
2008
ACM
13 years 9 months ago
Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine
This paper contributes and evaluates a model and a methodology for implementing parallel wavefront algorithms on the Cell Broadband Engine. Wavefront algorithms are vital in sever...
Ashwin M. Aji, Wu-chun Feng, Filip Blagojevic, Dim...