Sciweavers

556 search results - page 103 / 112
» Fast parallel GPU-sorting using a hybrid algorithm
Sort
View
ICS
2009
Tsinghua U.
14 years 3 months ago
High-performance regular expression scanning on the Cell/B.E. processor
Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in ever...
Daniele Paolo Scarpazza, Gregory F. Russell
ICS
2010
Tsinghua U.
13 years 11 months ago
Clustering performance data efficiently at massive scales
Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune ...
Todd Gamblin, Bronis R. de Supinski, Martin Schulz...
IPPS
2005
IEEE
14 years 2 months ago
Effective Instruction Prefetching via Fetch Prestaging
As technological process shrinks and clock rate increases, instruction caches can no longer be accessed in one cycle. Alternatives are implementing smaller caches (with higher mis...
Ayose Falcón, Alex Ramírez, Mateo Va...
ARCS
2008
Springer
13 years 10 months ago
An Optimized ZGEMM Implementation for the Cell BE
: The architecture of the IBM Cell BE processor represents a new approach for designing CPUs. The fast execution of legacy software has to stand back in order to achieve very high ...
Timo Schneider, Torsten Hoefler, Simon Wunderlich,...
EUROPAR
2008
Springer
13 years 10 months ago
Efficiently Building the Gated Single Assignment Form in Codes with Pointers in Modern Optimizing Compilers
Abstract. Understanding program behavior is at the foundation of program optimization. Techniques for automatic recognition of program constructs characterize the behavior of code ...
Manuel Arenaz, Pedro Amoedo, Juan Touriño