Sciweavers

555 search results - page 16 / 111
» Efficient event-driven simulation of parallel processor arch...
Sort
View
98
Voted
SPAA
1992
ACM
15 years 6 months ago
Subset Barrier Synchronization on a Private-Memory Parallel System
A global barrier synchronizes all processors in a parallel system. This paper investigates algorithms that allow disjoint subsets of processors to synchronize independently and in...
Anja Feldmann, Thomas R. Gross, David R. O'Hallaro...
140
Voted
ARCS
2008
Springer
15 years 4 months ago
Hybrid Parallel Sort on the Cell Processor
: Sorting large data sets has always been an important application, and hence has been one of the benchmark applications on new parallel architectures. We present a parallel sortin...
Jörg Keller, Christoph W. Kessler, Kalle K&ou...
139
Voted
MICRO
2010
IEEE
149views Hardware» more  MICRO 2010»
15 years 21 days ago
Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels
Wide Single Instruction, Multiple Thread (SIMT) architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application ker...
Michael Steffen, Joseph Zambreno
137
Voted
TCAD
2002
104views more  TCAD 2002»
15 years 2 months ago
An instruction-level energy model for embedded VLIW architectures
In this paper, an instruction-level energy model is proposed for the data-path of very long instruction word (VLIW) pipelined processors that can be used to provide accurate power ...
Mariagiovanna Sami, Donatella Sciuto, Cristina Sil...
126
Voted
VRIPHYS
2010
14 years 9 months ago
Asynchronous Preconditioners for Efficient Solving of Non-linear Deformations
In this paper, we present a set of methods to improve numerical solvers, as used in real-time non-linear deformable models based on implicit integration schemes. The proposed appr...
Hadrien Courtecuisse, Jérémie Allard...