Collective operations and non-blocking point-to-point operations are two important parts of MPI that each provide important performance and programmability benefits. Although non-blocking collective operations are an obvious extension to MPI, there have been no comprehensive studies of this functionality. This dissertation will study nonblocking collective operations, integrating theory, practice, and application. We use a well-understood network model to found our theoretical analyses and we realize our communication operations as a portable library layered on MPI. A real-world quantum-mechanical application is used as a deployment and evaluation vehicle for our approach.