Sciweavers

HPDC
2008
IEEE

File grouping for scientific data management: lessons from experimenting with real traces

14 years 18 days ago
File grouping for scientific data management: lessons from experimenting with real traces
Abstract-The analysis of data usage in a large set of real traces from a high-energy physics collaboration revealed the existence of an emergent grouping of files that we coined "filecules". This paper presents the benefits of using this file grouping for prestaging data and compares it with previously proposed file grouping techniques along a range of performance metrics. Our experiments with real workloads demonstrate that filecule is a reliable and useful abstraction for data management in science Grids; that preserving time locality for data prestaging is highly recommended; that job reordering with respect to data availability has significant impact on throughput; and finally, that a relatively short history of traces is a good predictor for filecule grouping. Our experimental results provide lessons for workload modeling and suggest design guidelines for data management in dataintensive resource-sharing environments.
Shyamala Doraimani, Adriana Iamnitchi
Added 08 Dec 2010
Updated 08 Dec 2010
Type Conference
Year 2008
Where HPDC
Authors Shyamala Doraimani, Adriana Iamnitchi
Comments (0)