In this paper we present an efficient algorithm for compile-time scheduling and clustering of parallel programs onto parallel processing systems with distributed memory, which is called The Dynamic Critical Path Scheduling DCPS. The DCPS is superior to several other algorithms from the literature in terms of computational complexity, processors consumption and solution quality. DCPS has a time complexity of O(e + v log v), as opposed to DSC algorithm O((e+v) log v) which is the best known algorithm. Experimental results demonstrate the superiority of DCPS over the DSC algorithm.