Exploiting compile time knowledge to improve memory bandwidth can produce noticeable improvements at run-time [13, 1]. Allocating the data structure [13] to separate memories when...
Execution-driven simulators are often used for power/energy and performance evaluation. Simulators can provide semantic details but they provide insufficient speed and accuracy f...
Interconnection networks have been deployed as the communication fabric in a wide range of parallel computer systems. With recent technological trends allowing growing quantities ...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves inst...
In many scientific domains, researchers are turning to large-scale behavioral simulations to better understand real-world phenomena. While there has been a great deal of work on s...
Guozhang Wang, Marcos Antonio Vaz Salles, Benjamin...