This paper describes improvements to the Mach microkernel’s support for efficient application startup across multiple nodes in a cluster or massively parallel processor. Significant improvements in application startup times have been achieved by optimizing the existing remote task creation operation, implementing a facility to concurrently create multiple remote tasks in a single operation, and restructuring the underlying distributed virtual memory system to improve its scalability. One component of the restructuring involves the use of a hierarchical tree of objects to implement the paging path instead of a flat single level tree; this eliminates bottlenecks at the node that initiates the application. The other component consists of limiting the copy on write virtual memory optimization to single node operations; this achieves a separation of network sharing (read/write) from network read access (implemented by copy on reference). Although our implementation is specific to Mach, t...
Dejan S. Milojicic, David L. Black, Steven J. Sear