Understanding why the performance of a multithreaded program does not improve linearly with the number of cores in a sharedmemory node populated with one or more multicore process...
Abstract—We present a new low-level interfacing scheme for connecting custom accelerators to processors that tolerates latencies that usually occur when accessing hardware accele...
Jaroslav Sykora, Leos Kafka, Martin Danek, Lukas K...
There is building interest in using FPGAs as accelerators for high-performance computing, but existing systems for programming them are so far inadequate. In this paper we propose...
Algorithmic choice is essential in any problem domain to realizing optimal computational performance. Multigrid is a prime example: not only is it possible to make choices at the ...
Cy P. Chan, Jason Ansel, Yee Lok Wong, Saman P. Am...
strokes makes it particularly attractive in terms of storage size and transmission efficiency. Its efficient implementation is favourable for incorporation into windowing systems a...