In this paper we present a multiple phase I/O collective operation for generic block cyclic distributions. The communication pattern is automatically generated by an inspector phase and the communication and file access phase are performed by an executor phase. The inspector phase can be amortized over several accesses. We show that our method outperforms other techniques used for parallel I/O optimizations for small access granularities.
David E. Singh, Florin Isaila, Juan Carlos Pichel,