Divide and conquer algorithms are a good match for modern parallel machines: they tend to have large amounts of inherent parallelism and they work well with caches and deep memory...
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...
Comprehensive exploration of the design space parameters at the system-level is a crucial task to evaluate architectural tradeoffs accounting for both energy and performance const...
William Fornaciari, Donatella Sciuto, Cristina Sil...
In data dominated applications, like multi-media and telecom applications, data storage and transfers are the most important factors in terms of energy consumption, area and syste...
Erik Brockmeyer, Arnout Vandecappelle, Sven Wuytac...
Recent advances in Field-Programmable Gate Arrays (FPGA) and programmable interconnects have made it possible to build efficient hardware emulation engines. In addition, improveme...