DTA (Decoupled Threaded Architecture) is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on exi...
We introduce a software/hardware scheme called the Field Array Compression Technique (FACT) which reduces cache misses due to recursive data structures. Using a data layout transfo...
Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs, namely generation of fault-span, the set of states re...
Fuad Abujarad, Borzoo Bonakdarpour, Sandeep S. Kul...
Memory performance is increasingly determining microprocessor performance and technology trends are exacerbating this problem. Most architectures use set-associative caches with L...
Zhenlin Wang, Kathryn S. McKinley, Arnold L. Rosen...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource contention and exploiting memory-level parallelism in Simultaneous Multithreaded...