The scale at which scientific data is produced will undergo a massive change in the near future. Many sophisticated scientific discovery laboratories or the installation of sensor networks would produce a large amount of data. Research in protein crystallography for instance can produce hundreds of Terabytes of data from a single crystallography beamline. These data have to be saved for future use and made available for collaborative use by researchers. There is a need to develop a framework which can deal with storing such data volumes. This framework should also handle disparate data sources tightly integrated with the users' applications and large data streams arising from instruments and sensors. This paper presents an initial study into the framework for servicing large dynamic data sets over a national grid for eResearch.
A. B. M. Russel, Asad I. Khan