Sciweavers

420 search results - page 83 / 84
» Scalable Parallel Programming with CUDA
Sort
View
MICRO
2010
IEEE
156views Hardware» more  MICRO 2010»
13 years 5 months ago
Explicit Communication and Synchronization in SARC
SARC merges cache controller and network interface functions by relying on a single hardware primitive: each access checks the tag and the state of the addressed line for possible...
Manolis Katevenis, Vassilis Papaefstathiou, Stamat...
ISCA
2009
IEEE
146views Hardware» more  ISCA 2009»
14 years 2 months ago
Multi-execution: multicore caching for data-similar executions
While microprocessor designers turn to multicore architectures to sustain performance expectations, the dramatic increase in parallelism of such architectures will put substantial...
Susmit Biswas, Diana Franklin, Alan Savage, Ryan D...
MICRO
2008
IEEE
107views Hardware» more  MICRO 2008»
14 years 1 months ago
A distributed processor state management architecture for large-window processors
— Processor architectures with large instruction windows have been proposed to expose more instruction-level parallelism (ILP) and increase performance. Some of the proposed arch...
Isidro Gonzalez, Marco Galluzzi, Alexander V. Veid...
CCGRID
2006
IEEE
14 years 1 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
IEEEPACT
2005
IEEE
14 years 28 days ago
A Distributed Control Path Architecture for VLIW Processors
VLIW architectures are popular in embedded systems because they offer high-performance processing at low cost and energy. The major problem with traditional VLIW designs is that t...
Hongtao Zhong, Kevin Fan, Scott A. Mahlke, Michael...