We introduce techniques to support efficient non-atomic execution of very long traces on a new binary translation based, x86-64 compatible VLIW microprocessor. Incrementally committed long traces significantly reduce wasted computations on exception induced rollbacks by retaining the correctly committed parts of traces. We divide each scheduled trace into multiple commit groups; groups are committed to the architectural state after all instructions within and prior to each group complete without exceptions. Architectural state updates are only visible after future commit points are deferred using a simple hardware commit buffer. We employ a commit depth predictor to predict how many groups a trace will complete, thereby eliminating pipeline flushes on repeated rollbacks. Unlike atomic traces, we allow instructions to be freely scheduled across commit points throughout the trace to maximize ILP. Commit groups are formed after scheduling, allowing the commit points terminating each g...
Matt T. Yourst, Kanad Ghose