Deeply pipelined high performance processors require highly accurate branch prediction to drive their instruction fetch. However there remains a class of events which are not easi...
This paper describes the implementation and evaluation of the OpenMP compiler designed for the Hitachi SR8000 Super Technical Server. The compiler performs parallelization for the ...
This paper describes transparent mechanisms for emulating some of the data distribution facilities offered by traditional data-parallel programming models, such as High Performance...
Dimitrios S. Nikolopoulos, Theodore S. Papatheodor...
Partitioned parallel radix sort is a parallel radix sort that shortens the execution time by modifying the load balanced radix sort which is known one of the fastest internal sort...
Shin-Jae Lee, Minsoo Jeon, Andrew Sohn, Dongseung ...
For the past two decades, developments in DRAM technology, the primary technology for the main memory of computers, have been directed towards increasing density. As a result 256 M...
Traditional parallel compilers do not effectively parallelize irregular applications because they contain little looplevel parallelism due to ambiguous memory references. We explo...
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technolo...
Isosurface generation algorithms usually need a vertex-identification process since most of polygon-vertices of an isosurface are shared by several polygons. In our observation the...