Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient run-time system for executing general asynchronous DAG schedules on distributed memory machines. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying, and takes advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse LU and Cholesky factorizations for which actual speedups have been hard to obtain in the literature because parallelism in these problems is irregular and limited. Our experiments on Meiko CS-2 show the promising results of our approach in exploiting irregular task parallelism with mixed granularities.