We describe a software solution to the problem of automatic parallelization of linear algebra code on multi-processor and multi-core architectures. This solution relies on the definition of a domain specific language for matrix computations, a performance model for multiprocessor architectures and its implementation using C++ template metaprogramming. Experimental results asses this model and its implementation on sample computation kernels.