We propose an application specific processor for computational quantum chemistry. The kernel of interest is the computation of electron repulsion integrals (ERIs), which vary in control flow with different input data. This lack of uniformity limits the level of data-level parallelism (DLP) inherent in the application, thus apparently rendering a SIMD architecture unfeasible. All ERIs may be computed in parallel, therefore there is much thread-level parallelism (TLP). We observe that it is possible to match threads with certain characteristics in a manner that reveals significant DLP across multiple threads. Our thread matching and scheduling scheme effectively converts TLP to DLP, allowing SIMD processing which was previously unfeasible. We envision that this approach may expose DLP in other applications traditionally considered to be poor candidates for SIMD computation.
Tirath Ramdas, Gregory K. Egan, David Abramson, Ki