Many image and signal processing kernels can be optimized for performance consuming a reasonable area by doing loops parallelization with extensive use of pipelining. This paper p...
Zubair Nawaz, Thomas Marconi, Koen Bertels, Todor ...
Understanding why the performance of a multithreaded program does not improve linearly with the number of cores in a sharedmemory node populated with one or more multicore process...
Superscalar microprocessor efficiency is generally not as high as anticipated. In fact, sustained utilization below thirty percent of peak is not uncommon, even for fully optimized...
This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information ...
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. P...
We use reconfigurable hardware to construct a high throughput Bayesian computing machine (BCM) capable of evaluating probabilistic networks with arbitrary DAG (directed acyclic gr...