With applications becoming larger and the increasing load on high performance systems, it is important to tackle the I/O bottleneck problem from several angles. It is not only essential to optimize the I/O accesses of any one application, but also to be able to identify and exploit opportunities resulting from the sharing of datasets across applications. Clusters are rapidly becoming the platform of choice for demanding applications due to their costeffectiveness and widespread deployment. Consequently, this paper attempts to optimize data sharing across applications concurrently executing on the cluster. Specifically, we propose and implement a kernel-level caching module at each node of a Linux cluster that can be used to service several processes of different applications. Using detailed evaluations on an actual Linux cluster, this paper demonstrates the benefits of this module in optimizing intra and inter-application I/O requests.
Murali Vilayannur, Mahmut T. Kandemir, Anand Sivas