This paper presents a middleware capable of out-of-order execution of kernels and data transfers for efficient stream processing in the compute unified device architecture (CUDA). ...
Power consumption has become an increasingly important constraint in high-performancecomputing systems, shifting the focus from peak performance towards improving power efficiency...