Sciweavers

835 search results - page 17 / 167
» On optimal slicing of parallel programs
Sort
View
MICRO
2010
IEEE
153views Hardware» more  MICRO 2010»
13 years 5 months ago
Throughput-Effective On-Chip Networks for Manycore Accelerators
As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network des...
Ali Bakhoda, John Kim, Tor M. Aamodt
PLDI
1995
ACM
13 years 11 months ago
Improving Balanced Scheduling with Compiler Optimizations that Increase Instruction-Level Parallelism
Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory lat...
Jack L. Lo, Susan J. Eggers
PPOPP
2010
ACM
14 years 4 months ago
Structure-driven optimizations for amorphous data-parallel programs
Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has pr...
Mario Méndez-Lojo, Donald Nguyen, Dimitrios...
ICS
2001
Tsinghua U.
14 years 2 days ago
Global optimization techniques for automatic parallelization of hybrid applications
This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our sche...
Dhruva R. Chakrabarti, Prithviraj Banerjee
ICPP
1998
IEEE
13 years 12 months ago
Concurrent SSA Form in the Presence of Mutual Exclusion
Most current compiler analysis techniques are unable to cope with the semantics introduced by explicit parallel and synchronization constructs in parallel programs. In this paper ...
Diego Novillo, Ronald C. Unrau, Jonathan Schaeffer