Sciweavers

420 search results - page 39 / 84
» Scalable Parallel Programming with CUDA
Sort
View
EUROPAR
2004
Springer
14 years 1 months ago
Targeting Heterogeneous Architectures in ASSIST: Experimental Results
Abstract. We describe how the ASSIST parallel programming environment can be used to run parallel programs on collections of heterogeneous workstations and evaluate the scalability...
Marco Aldinucci, Sonia Campa, Massimo Coppola, Sil...
ISCA
2005
IEEE
144views Hardware» more  ISCA 2005»
14 years 2 months ago
Scalable Load and Store Processing in Latency Tolerant Processors
Memory latency tolerant architectures support thousands of in-flight instructions without scaling cyclecritical processor resources, and thousands of useful instructions can compl...
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth...
MICRO
2010
IEEE
153views Hardware» more  MICRO 2010»
13 years 6 months ago
Throughput-Effective On-Chip Networks for Manycore Accelerators
As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network des...
Ali Bakhoda, John Kim, Tor M. Aamodt
ICPP
2008
IEEE
14 years 2 months ago
Mapping Algorithms for Multiprocessor Tasks on Multi-Core Clusters
In this paper, we explore the use of hierarchically structured multiprocessor tasks (M-tasks) for programming multi-core cluster systems. These systems often have hierarchically s...
Jörg Dümmler, Thomas Rauber, Gudula R&uu...
IEEEPACT
2008
IEEE
14 years 2 months ago
Scalable and reliable communication for hardware transactional memory
In a hardware transactional memory system with lazy versioning and lazy conflict detection, the process of transaction commit can emerge as a bottleneck. This is especially true ...
Seth H. Pugsley, Manu Awasthi, Niti Madan, Naveen ...