Simulations, experiments and observatories are generating a deluge of scientific data. Even more staggering is the ever growing application demand to process and assimilate these datasets. Application users perform a range of data operations, collaborate and share data in many novel ways. The current storage landscape is struggling to keep up with these trends in scientific data processing. Application users pay the price due to over-crowded shared filesystems, or expensive storage area networks, or not enough local storage, or high-latency archival or wide-area transfers. In order to sustain and maximize I/O bandwidth relative to increasing CPU speeds, applications must take advantage of large amounts of intermediate commodity storage, However, intermediate storage presents new challenges above and beyond the traditional distributed filesystem paradigm: persistent scheduling, storage/CPU coallocation, namespace management, lifetime management, and novel application interfaces. In...
Sudharshan S. Vazhkudai, Douglas Thain, Xiaosong M