Providing up-to-date input to users’ applications is an important data management problem for a distributed computing environment, where each data storage location and intermediate node may have specific data available, storage limitations, and communication links available. Sites in the network request data items and each request has an associated deadline and priority. This work concentrates on solving a basic version of the data staging problem in which all parameter values for the communication system and the data request information represent the best known information collected so far and stay fixed throughout the scheduling process. The network is assumed to be oversubscribed and not all requests for data items can be satisfied. A mathematical model for the basic data staging problem is given. Then, three multiple-source shortest-path algorithm based heuristics for finding a near-optimal schedule of the communication steps for staging the data are presented. Each heuristi...
Mitchell D. Theys, Noah Beck, Howard Jay Siegel, M