Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

129

Voted

CF
2004
ACM

favoriteEmaildiscussreport

114views Applied Computing» more CF 2004»

Improving the execution time of global communication operations

15 years 7 months ago

Improving the execution time of global communication operations

Download www.tu-chemnitz.de

Many parallel applications from scientiﬁc computing use MPI global communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for diﬀerent MPI implementations how the execution time of global communication operations can be signiﬁcantly improved by a restructuring based on orthogonal processor structures. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast() or MPI Allgather() can be reduced by 40% and 70% on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a signiﬁcant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implemen...

Matthias Kühnemann, Thomas Rauber, Gudula R&u

Real-time Traffic

CF 2004 | Communication Operations | Execution Time | Global Communication Operations |

claim paper

Related Content

» A Communication Framework for FaultTolerant Parallel Execution

» Optimizing Synchronization Operations for Remote Memory Communication Systems

» Improving Federation Executions with Migrating HLARTI Central Runtime Components

» Optimizing MPI collective communication by orthogonal structures

» MagPIe MPIs Collective Communication Operations for Clustered Wide Area Systems

» Exploiting DMA to enable nonblocking execution in Decoupled Threaded Architecture

» Performance Portable Optimizations for Loops Containing Communication Operations

» Remote Attestation on Legacy Operating Systems With Trusted Platform Modules

» RealTime Operating System Services for Realistic SystemC Simulation Models of Embedded Sys...

Post Info
More Details (n/a)

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	CF
Authors	Matthias Kühnemann, Thomas Rauber, Gudula Rünger

Comments (0)