Modern programs frequently employ sophisticated modular designs. As a result, performance problems cannot be identified from costs attributed to routines in isolation; understand...
Nathan R. Tallent, John M. Mellor-Crummey, Michael...
Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed w...
In this paper, we present a real-time algorithm for 3D object detection in images. Our method relies on the Ullman and Basri [13] theory which claims that the same object under di...
Transactional Coherence and Consistency (TCC) offers a way to simplify parallel programming by executing all code within transactions. In TCC systems, transactions serve as the fu...
Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben...
The growing speed gap between memory and processor makes an efficient use of the cache ever more important to reach high performance. One of the most important ways to improve cac...