Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have ‘compatible’ demands on the processor’s shared resour...
As the number of devices available per chip continues to increase, the computational potential of future computer architectures grows likewise. While this is a clear benefit for f...
A traditional fixed-function graphics accelerator has evolved into a programmable general-purpose graphics processing unit over the last few years. These powerful computing cores...
In this paper we evaluate the atomic region compiler abstraction by incorporating it into a commercial system. We find that atomic regions are simple and intuitive to integrate i...
Naveen Neelakantam, David R. Ditzel, Craig B. Zill...
The Local Outlier Factor (LOF) is a very powerful anomaly detection method available in machine learning and classification. The algorithm defines the notion of local outlier in...
Scalable heterogeneous computing systems, which are composed of a mix of compute devices, such as commodity multicore processors, graphics processors, reconfigurable processors, ...
Anthony Danalis, Gabriel Marin, Collin McCurdy, Je...
Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. Thi...
SIMD (Single Instruction, Multiple Data) engines are an essential part of the processors in various computing markets, from servers to the embedded domain. Although SIMD-enabled a...
Amir Hormati, Yoonseo Choi, Mark Woh, Manjunath Ku...
This paper presents a randomized scheduler for finding concurrency bugs. Like current stress-testing methods, it repeatedly runs a given test program with supplied inputs. Howeve...
Sebastian Burckhardt, Pravesh Kothari, Madanlal Mu...