This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to op...
Yanwei Niu, Ziang Hu, Kenneth E. Barner, Guang R. ...
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
In this paper we study the performance improvements and trade-offs derived from an optimized mapping approach applied on a parametric coarse grained reconfigurable array architect...
Grigoris Dimitroulakos, Michalis D. Galanis, Const...
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...
We describe a software architecture for storage services in computational grid environments. Based upon a lightweight message-passing paradigm, the architecture enables the provis...