Brewer and Kuszmaul [BK94] demonstrated how barriers and traffic interleaving can alleviate the problem of bulk-transfer performance degradation on the Thinking Machines CM-5, by exploiting the observation that 1-on-1 communication avoids network congestion. We apply and extend these techniques on the Intel Paragon and MIT Alewife machines. Because these machines lack the CM5's fast hardware support for barriers, we introduce a token-passing scheme that avoids barriers while maintaining 1-on-1 communication. We also introduce a new algorithm, distributed dynamic scheduling, that brings Brewer and Kuszmaul's observations to bear on irregular traffic patterns by massaging traffic into a sequence of near-permutations at runtime, without requiring any preprocessing or global state. The measured performance of our algorithm exceeds that of traffic interleaving (the most effective technique proposed in [BK94]) on all three platforms, and is comparable to the performance of static ...
Eric A. Brewer, Paul Gauthier, Armando Fox, Angela