Autotuning technology has emerged recently as a systematic process for evaluating alternative implementations of a computation, in order to select the best-performing solution for...
Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun...
In this paper, we propose a novel hardware caching technique, called switch directory, to reduce the communication latency in CC-NUMA multiprocessors. The main idea is to implemen...
We present a method for finding efficient instruction sequences for the Serpent S-boxes. Current implementations need many registers to store temporary variables, yet the common ...
Problem determination in today's computing environments consumes between 30 and 70% of an organization’s IT resources and represents from one third to one half of their tot...
We present an efficient method for mutual information (MI) computation between images (2D or 3D) for NVIDIA’s ‘compute unified device architecture’ (CUDA) compatible devic...