The memory hierarchy of most multicore systems contains one or more levels of cache that is shared among multiple cores. The shared-cache architecture presents many opportunities f...
— Fast Fourier transform (FFT) algorithms are used in a wide variety of digital signal processing applications and many of these require high-performance parallel implementations...
Parallel and distributed computing infrastructure are increasingly being embraced in the context of manufacturing applications, including real-time scheduling. In this paper, we pr...
This paper surveys and places into perspective a number of results concerning the D-BSP (Decomposable Bulk Synchronous Parallel) model of computation, a variant of the popular BSP ...
Gianfranco Bilardi, Carlo Fantozzi, Andrea Pietrac...
Abstract. Performance modeling is important for implementing efficient parallel applications and runtime systems. The LogP model captures the relevant aspects of message passing i...