The behavior and performance of MPI non-blocking message passing operations are sensitive to implementation specifics as they are heavily dependant on available system level buffers. In this paper we investigate the behavior of non-blocking communication primitives provided by popular MPI implementations and propose strategies for these primitives than can reduce processor synchronization overheads. We also demonstrate the improvements in the performance of a parallel Structured Adaptive Mesh Refinement (SAMR) application using these strategies.