Big data presents new challenges to both cluster infrastructure software and parallel application design. We present a set of software services and design principles for data intensive computing with petabyte data sets, named GrayWulf . These services are intended for deployment on a cluster of commodity servers similar to the well-known Beowulf clusters. We use the Pan-STARRS system currently under development as an example of the architecture and principles in action.
Yogesh Simmhan, Roger S. Barga, Catharine van Inge