Memory system optimizations have been well studied on single-threaded systems; however, the wide use of simultaneous multithreading (SMT) techniques raises questions over their ef...
For high-data-rate applications, turbo-like iterative decoders are implemented with parallel hardware architecture. However, to achieve high throughput, concurrent accesses to each...
Awais Sani, Philippe Coussy, Cyrille Chavet, Eric ...
UPC’s implicit communication and fine-grain programming style make application performance modeling a challenging task. The correspondence between remote references and communi...
We propose an execution model that orchestrates the fine-grained interaction of a conventional general-purpose processor (GPP) and a high-speed reconfigurable hardware accelerator ...
The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Access to remote memory is likely to ...
Edward D. Moreno, Sergio Takeo Kofuji, Marcelo H. ...