Sciweavers

814 search results - page 38 / 163
» Improving the execution time of global communication operati...
Sort
View
PLDI
2011
ACM
12 years 10 months ago
Automatic CPU-GPU communication management and optimization
The performance benefits of GPU parallelism can be enormous, but unlocking this performance potential is challenging. The applicability and performance of GPU parallelizations is...
Thomas B. Jablin, Prakash Prabhu, James A. Jablin,...
PPOPP
2010
ACM
14 years 5 months ago
Scalable communication protocols for dynamic sparse data exchange
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
CLUSTER
2006
IEEE
13 years 11 months ago
A Performance Instrumentation Framework to Characterize Computation-Communication Overlap in Message-Passing Systems
Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end compu...
Aniruddha G. Shet, P. Sadayappan, David E. Bernhol...
PLDI
1998
ACM
13 years 12 months ago
Improving Performance by Branch Reordering
The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on de...
Minghui Yang, Gang-Ryung Uh, David B. Whalley
WSC
2008
13 years 10 months ago
Creating and using non-kinetic effects: Training joint forces for asymmetric operations
US military forces now face asymmetric military operations. Management of relationships with civilians is often crucial to success. Local population groups can provide critical in...
Hugh Henry, Robert G. Chamberlain