Advanced instruments in a variety of scientific domains are collecting massive amounts of data that must be postprocessed and organized to support research activities. Astronomers have been pioneers in the use of databases to host sky survey data. Increasing data volumes from more powerful telescopes pose enormous challenges to state-ofthe-art database systems and data-loading techniques. In this paper we present SkyLoader, our novel framework for data loading that is being used to populate a multi-table, multi-terabyte database repository for the Palomar-Quest sky survey. SkyLoader consists of an efficient algorithm for bulk loading, an effective data structure to support data integrity, optimized parallelism, and guidelines for system tuning. Performance studies show the positive effects of these techniques, with load time for a 40-gigabyte data set reduced from over 20 hours to less than 3 hours. Our framework offers a promising approach for loading other large and complex scientif...
Y. Dora Cai, Ruth A. Aydt, Robert Brunner