Abstract. Clocks are a mechanism for providing synchronization barriers in concurrent programming languages. They are usually implemented using primitive communication mechanisms a...
Abstract. We introduce the design and implementation of dynamic separation (DS) as a programming discipline for using transactional memory. Our approach is based on the programmer ...
Register allocation decides which parts of a variable's live range are held in registers and which in memory. The compiler inserts spill code to move the values of variables b...
This paper presents a compiler which produces machine code from functions defined in the logic of a theorem prover, and at the same time proves that the generated code executes the...
Magnus O. Myreen, Konrad Slind, Michael J. C. Gord...
form uses a notational abstractions called -functions. These instructions have no analogous in actual machine instruction sets, and they must be replaced by ordinary instructions ...
In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implement...
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
As computational power increases, tele-immersive applications are an emerging trend. These applications make extensive demands on computational resources through their heavy use o...
Albert Sidelnik, I-Jui Sung, Wanmin Wu, Marí...
Even though graphics processors (GPUs) are becoming increasingly popular for general purpose computing, current (and likely near future) generations of GPUs do not provide hardwar...