File Clustering Based Replication Algorithm in a Grid Environment

15 years 11 months ago

Download matsu-www.is.titech.ac.jp

Replication in grid ﬁle systems can signiﬁcantly improves I/O performance of data-intensive applications. However, most of existing replication techniques apply to individual ﬁles, which may introduce inefﬁcient replication overheads for a large number of ﬁles. We propose a ﬁle clustering based replication algorithm for grid ﬁle systems. Our algorithm groups ﬁles according to a relationship of simultaneous accesses between ﬁles and stores the replicas of the clustered ﬁles into storage nodes, to satisfy expected most of future read access times to the clustered ﬁles and replication times for individual ﬁles being minimized under the given storage capacity limitation. Our experiments on a given grid environment, 20 nodes of 5 sites, suggest that the proposed algorithm achieves accurate ﬁle clustering and efﬁcient replica management; our clustering policy with the ﬁle cluster size limit of 5120 MB and storage capacity limit

Hitoshi Sato, Satoshi Matsuoka, Toshio Endo

Real-time Traffic