As superscalar processors become increasingly wide, it is inevitable that the large set of instructions to be fetched every cycle will span multiple noncontiguous basic blocks. Th...
Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduct...
We present the Buffer Heap (BH), a cache-oblivious priority queue that supports Delete-Min, Delete, and Decrease-Key operations in O( 1 B log2 N B ) amortized block transfers fro...
Cache locality optimization is an efficient way for reducing the idle time of modern processors in waiting for needed data. This kind of optimization can be achieved either on the...
Despite the progress of the last decades in electronic design automation, arithmetic circuits have always received way less attention than other classes of digital circuits. Logic...