The architecture for a shared memory CPU is described. The CPU allows for parallelism down to the level of single instructions and is tolerant of memory latency. All executable in...
Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [1–3]...
In contrast to the common belief that OpenMP requires data-parallel extensions to scale well on architectures with non-uniform memory access latency, recent work has shown that it ...
One approach in verifying the correctness of a multiprocessor system is to show that its execution results comply with the memory consistency model it is meant to implement. It ha...
Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for mem...
M. F. Sakr, Steven P. Levitan, Donald M. Chiarulli...