Replication is a key technique for improving fault tolerance. Replication can also improve application performance under some circumstances, but can have the opposite effect under others. In this paper we focus on a class of Grid applications—long-running, compute-intensive, and write-mostly—and develop a calculus that takes into consideration the I/O characteristics of applications and failure behavior of distributed storage nodes to prescribe a file system replication strategy that maximizes the utilization of computational resources. October 8, 2007 Center for Information Technology Integration University of Michigan 535 W. William St., Suite 3100 Ann Arbor, MI 48103-4978