Allowing loads to issue out-of-order with respect to earlier unresolved store addresses is very important for extracting parallelism in large-window superscalar processors. Blindly allowing all loads to issue as soon as their addresses are ready can lead to a net performance loss due to a large number of load-store ordering violations. Previous research has proposed memory dependence prediction algorithms to prevent only loads with true memory dependencies from issuing in the presence of unresolved stores. Techniques such as load-store pair identification and store sets have been very successful in achieving performance levels close to that attained by an oracle dependence predictor. These techniques tend to employ relatively complex CAM-based designs, which we believe have been obstacles to the industrial adoption of these algorithms. In this paper, we use the idea of dependency vectors from matrix schedulers for non-memory instructions, and adapt them to implement a new dependence p...
Samantika Subramaniam, Gabriel H. Loh