We present an efficient implementation of the Modified SParse Approximate Inverse (MSPAI) preconditioner. MSPAI generalizes the class of preconditioners based on Frobenius norm mi...
Thomas Huckle, A. Kallischko, A. Roy, M. Sedlacek,...
Divide and conquer algorithms are a good match for modern parallel machines: they tend to have large amounts of inherent parallelism and they work well with caches and deep memory...
This paper presents a methodology to efficiently explore the design space of communication adapters. In most digital signal processing (DSP) applications, the overall performance ...
Cyrille Chavet, Philippe Coussy, Pascal Urard, Eri...
This paper solves the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on chip multiprocessors. Our algorithm solves it opt...
Duo Liu, Zili Shao, Meng Wang, Minyi Guo, Jingling...
Application-specific instructions can significantly improve the performance, energy, and code size of configurable processors. A common approach used in the design of such instruc...