Abstract. Automatic performance analysis of parallel programs can be accomplished by scanning event traces of program execution for patterns representing inefficient behavior. The ...
Abstract. This paper presents an application-level non-blocking multicast scheme for dynamic DAG scheduling on large-scale distributedmemory systems. The multicast scheme takes int...
—Partitioned global address space (PGAS) languages, such as Unified Parallel C (UPC) have the promise of being productive. Due to the shared address space view that they provide,...
One of the most important collective communication patterns used in scientific applications is the complete exchange, also called All-to-All. Although efficient algorithms have b...
Noncontiguous data access is a very common access pattern in many scientific applications. Using POSIX I/O to access many pieces of noncontiguous data segments will generate a lot...