Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
The run-time performance of VLIW (very long instruction word) microprocessors depends heavily on the effectiveness of its associated optimizing compiler. Typical VLIW compiler pha...
In this work we examine a task scheduling and data migration problem for Grid Networks, which we refer to as the Data Consolidation (DC) problem. DC arises when a task needs for i...
Panagiotis C. Kokkinos, Kostas Christodoulopoulos,...
Memory-intensive threads can hoard shared resources without making progress on a multithreading processor (SMT), thereby hindering the overall system performance. A recent promisi...
Simultaneous Multithreading (SMT) attempts to keep a dynamically scheduled processor's resources busy with work from multiple independent threads. Threads with longlatency st...
Samantika Subramaniam, Milos Prvulovic, Gabriel H....