This paper presents a new modulo scheduling algorithm for clustered microarchitectures. The main feature of the proposed scheme is that the assignment of instructions to clusters ...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and latency. The early knowledge of the data cache bank that an instruction will access ...
Stefan Bieschewski, Joan-Manuel Parcerisa, Antonio...
Despite their superior performance for multimedia applications, vector processors have three limitations that hinder their widespread acceptance. First, the complexity and size of...
Parameter variation due to manufacturing error will be an unavoidable consequence of technology scaling in future generations. The impact of random variation in physical factors s...
Ke Meng, Frank Huebbers, Russ Joseph, Yehea I. Ism...
Some instructions have more impact on processor performance than others. Identification of these critical instructions can be used to modify and improve instruction processing. Pr...
Samantika Subramaniam, Anne Bracy, Hong Wang 0003,...