Data warehouses and recording systems typically have a large continuous stream of incoming data, that must be stored in a manner suitable for future access. Access to stored records is usually based on a key. Organizing the dataon disk as the data arrives using standard techniques would result in either (a) one or moreI/Os to store each incomingrecord (to keep the data clustered by the key), which is too expensive when data arrival rates are very high, or (b) many I/Os to locate records for a particular customer (if data is stored clustered by arrival order). We study two techniques, inspired by externalsortingalgorithms,tostore dataincrementally as it arrives, simultaneously providing good performance for recording and querying. We present concurrency control and recovery schemes for both techniques. We show the bene ts of our techniques both analytically and experimentally.
H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S.