In the sub-micron technology era, wire delays are becoming much more important than gate delays, making it particularly attractive to go for clustered designs. A common form of cl...
All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable Network Inte...
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on si...
The increasing demand for computational cycles is being met by the use of multi-core processors. Having large number of cores per node necessitates multi-core aware designs to ext...
Krishna Chaitanya Kandalla, Hari Subramoni, Gopala...
This paper describes improvements to the Mach microkernel’s support for efficient application startup across multiple nodes in a cluster or massively parallel processor. Signifi...
Dejan S. Milojicic, David L. Black, Steven J. Sear...