In chip-multiprocessors (CMPs), the number of cores and the issue width of each core presents an important design trade-off to balance the amount of TLP and ILP between multi-thre...
Concurrent multithreaded architectures exploit both instruction-level and thread-level parallelism through a combination of branch prediction and thread-level control speculation. ...
We describe cache architecture, intended for prototype-oriented IC platforms, that automatically finds the best cache configuration for a particular application. The cache itself ...
We reopen the issue of finding an implicit data structure for the dictionary problem. In particular, we examine the problem of maintaining Ò data values in the first Ò locatio...
Gianni Franceschini, Roberto Grossi, J. Ian Munro,...
Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has b...