We describe how to manage distributed file system caches based upon groups of files that are accessed together. We use file access patterns to automatically construct dynamic groupings of files and then manage our cache by fetching groups, rather than single files. We present experimental results, based on trace-driven workloads, demonstrating that grouping improves cache performance. At the file system client, grouping can reduce LRU demand fetches by 50 to 60%. At the server, cache hit rate improvements are much more pronounced, but vary widely (20 to over 1200%) depending upon the capacity of intervening caches. Our treatment includes information theoretic results that justify our approach to file grouping.
Ahmed Amer, Darrell D. E. Long, Randal C. Burns