In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two models, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. Both models have been optimized by using software caching, chunking and on-line compression mechanisms. We demonstrate that we can obtain significantly highperformance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine a...