In this work we present the results of a project aimed at assembling an hybrid massively parallel machine, the PQE1 prototype, devoted to the simulation of complex physical models. The analysis of some of the existing parallel architectures has revealed that general-purpose machines are largely over-dimensioned and often perform inefficiently in grand-challenge scientific applications. We have thus developed an heterogeneous parallel system which matches task-heterogeneity with architecture-heterogeneity: in fact special-purpose massively parallel architectures, when coupled to general-purpose machines, are able to efficiently satisfy the requirements of complex scientific computing. We present the HW structure and the SW tools developed for the PQE1 prototype. Starting from the concept of machine-granularity and task-granularity, we show the necessity to exploit both high granularity and low granularity parallelism to efficiently use the PQE1 system. Some examples describing applicat...