Thread-level speculation provides architectural support to aggressively run hard-to-analyze code in parallel. As speculative tasks run concurrently, they generate unsafe or specul...
We address the problem of efficient out-of-core code generation for a special class of imperfectly nested loops encoding tensor contractions arising in quantum chemistry computati...
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...
Dynamically tagged directories are memory-efficient mechanisms for maintaining cache coherence in sharedmemory multiprocessors. These directories use specialpurpose caches of poin...
High-level languages are growing in popularity. However, decades of C software development have produced large libraries of fast, timetested, meritorious code that are impractical...
Tristan Ravitch, Steve Jackson, Eric Aderhold, Ben...