This paper describes the DELFT-JAVA processor and the mechanisms required to dynamically translate JVM instructions into DELFT-JAVA instructions. Using a form of hardware register allocation, we transform stack bottlenecks into pipeline dependencies which are later removed using register renaming and interlock collapsing arithmetic units. When combined with superscalar techniques and multiple instruction issue, we remove up to 60% of translated dependencies. When compared with a realizable stack-based implementation, our approach accelerates a Vector Multiply execution by 3.2x for out-of-order execution with register reanaming and 2.7x when hardware constraints were considered. In addition, for translated instruction streams, we realized a 50% performance improvement for out-of-order execution when compared with in-order execution.
C. John Glossner, Stamatis Vassiliadis