Synchronization is often necessary in parallel computing, but it can create delays whenever the receiving processor is idle, waiting for the information to arrive. This is especially true for barrier, or global, synchronization, in which every processor must synchronize with every other processor. Nonetheless, barriers are the only form of synchronization explicitly supplied in OpenMP, and they occur whenever collective communication operations are used in MPI. Many applications do not actually require global synchronization; local synchronization, in which a processor synchronizes only with those processors from or to which information or resources are needed, is often adequate. However, when tasks take varying amounts of time the behavior of a system under local synchronization is more difficult to analyze since processors do not start tasks at the same time. We show that when the synchronization dependencies form a directed cycle and the task times are geometrically distributed wit...
Julia Lipman, Quentin F. Stout