There is an increasing trends towards distributed and shared repositories for storing scientific datasets. Developing applications that retrieve and process data from such repositories involves a number of challenges. First, these data repositories store data in complex, low-level which should be abstracted from application developers. Second, as data repositories are shared resources, part of the computations on the data must be performed at a different set of machines than the ones hosting the data. Third, because of the volume of data and the amount of computations involved, parallel configurations need to be used for both hosting the data and the processing on the retrieved data. In this paper, we describe a system for executing SQL-3 queries over scientific data stored as flatfiles. A relational table-based virtual view is supported on these flat-file datasets. The class of queries we consider involve data retrieval using Select and Where clauses, and processing with user-d...
Li Weng, Gagan Agrawal, Ümit V. Çataly