Sciweavers

555 search results - page 16 / 111
» Efficient event-driven simulation of parallel processor arch...
Sort
View
SPAA
1992
ACM
13 years 11 months ago
Subset Barrier Synchronization on a Private-Memory Parallel System
A global barrier synchronizes all processors in a parallel system. This paper investigates algorithms that allow disjoint subsets of processors to synchronize independently and in...
Anja Feldmann, Thomas R. Gross, David R. O'Hallaro...
ARCS
2008
Springer
13 years 9 months ago
Hybrid Parallel Sort on the Cell Processor
: Sorting large data sets has always been an important application, and hence has been one of the benchmark applications on new parallel architectures. We present a parallel sortin...
Jörg Keller, Christoph W. Kessler, Kalle K&ou...
MICRO
2010
IEEE
149views Hardware» more  MICRO 2010»
13 years 5 months ago
Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels
Wide Single Instruction, Multiple Thread (SIMT) architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application ker...
Michael Steffen, Joseph Zambreno
TCAD
2002
104views more  TCAD 2002»
13 years 7 months ago
An instruction-level energy model for embedded VLIW architectures
In this paper, an instruction-level energy model is proposed for the data-path of very long instruction word (VLIW) pipelined processors that can be used to provide accurate power ...
Mariagiovanna Sami, Donatella Sciuto, Cristina Sil...
VRIPHYS
2010
13 years 2 months ago
Asynchronous Preconditioners for Efficient Solving of Non-linear Deformations
In this paper, we present a set of methods to improve numerical solvers, as used in real-time non-linear deformable models based on implicit integration schemes. The proposed appr...
Hadrien Courtecuisse, Jérémie Allard...