This paper explores an architecture for parallel independent computations of inner products over the direct product ring . The structure is based on the polynomial mapping of the Modulus Replication RNS for calculations over dynamic ranges much larger than the product of the computational moduli. We show that the computational ring is optimal for our purposes, and introduce basic cells for the efficient calculation of all elements of the polynomial ring computations.
Wenzhe Luo, Graham A. Jullien, Neil M. Wigley, Wil