This paper proposes a high performance least square solver on FPGAs using the Cholesky decomposition method. Our design can be realized by iteratively adopting a single triangular linear equation solver for modified Cholesky decomposition and forward/backward substitutions. Good performance is achieved by optimizing the Cholesky factorization algorithms, reordering the computation and thus alleviating the data dependency. Dedicated hardware architecture for solving triangular linear equations is designed and implemented for different precision requirements. Compared to software on a Pentium 4, our design achieves a significant speedup
Depeng Yang, Gregory D. Peterson, Husheng Li, Junq