Abstract. Processing and analyzing large volumes of data plays an increasingly important role in many domains of scienti c research. We are developing a compiler which processes data intensive applications written in a dialect of Java and compiles them for e cient execution on cluster of workstations or distributed memory machines. In this paper, we focus on data intensive applications with two important properties: 1) data elements have spatial coordinates associated with them and the distribution of the data is not regular with respect to these coordinates, and 2) the application processes only a subset of the available data on the basis of spatial coordinates. These applications arise in many domains like satellite data-processing and medical imaging. We present a general compilation and execution strategy for this class of applications which achieves high locality in disk accesses. We then present a technique for hoisting conditionals which further improves e ciency in execution of...