Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed Memory Machines