The paper proposes a novel approach for optimizing performance of all-to-all collective communication by taking advantage of concurrency available in modern networks such as Infiniband or Quadrics. Using the MPI AllGather operation as an example, we describe how network concurrency can be exploited in an optimized implementation of this operation. For example, compared to leading MPI implementations for a 32-KB message on 128 processors, our new algorithm yields a 65% improvement on the Infiniband at Virginia Tech and an 89% improvement on the Quadrics cluster at Pacific Northwest National Laboratory. Categories and Subject Descriptors